gsl multiroot iteration trying nan? - c

I am trying to find the right parameters to input to my code to produced the desired results. Instead of guessing and checking I am using a root find to find the parameters that give the desired results. There are two variables that are free to vary, but I was having difficulty running the root finder. I changed the code to solve for each variable individually and found out that I was having trouble optimizing one variable.
It seems that the problem is that gsl_multiroot_iterate is guessing nan for x1 after the first iteration. At least that is what the value of x1 is returning in the function() call after that point, when I put in a printf statement for x1.
The simulation I am running only allows values of x1 between 0 and 1. It could be possible that this is causing the issue, though I check in the simulation to make sure x1 is between 0 and 1, and never throws an issue besides when x1 is nan. Is there anyway to set a range for what values the iteration tries for x1? And would anyone know what the iteration tries using nan for x1?
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_multiroots.h>
struct rparams{
double target1;
};
int function(const gsl_vector * x, void *params, gsl_vector * f);
int main(int argc, char* argv[]) {
double target1;
sscanf(argv[1],"%lf",&target1);
const gsl_multiroot_fsolver_type *T;
gsl_multiroot_fsolver *s;
int status;
unsigned int iter = 0;
const size_t n = 1;
struct rparams p;
p.target1 = target1;
gsl_multiroot_function f = {&function, n, &p};
double x_init[1] = {.1};
gsl_vector * x = gsl_vector_alloc(n);
gsl_vector_set(x, 0, x_init[0]);
T = gsl_multiroot_fsolver_hybrid;
s = gsl_multiroot_fsolver_alloc(T, 1);
gsl_multiroot_fsolver_set(s, &f, x);
print_state(iter, s);
do
{
iter++;
status = gsl_multiroot_fsolver_iterate (s);
print_state(iter, s);
/* check if solver is stuck */
if (status){
break;
}
status = gsl_multiroot_test_residual (s->f, 1e-7);
}
while (status == GSL_CONTINUE && iter < 1000);
printf("status = %s\n", gsl_strerror (status));
gsl_multiroot_fsolver_free (s);
gsl_vector_free (x);
return 0;
}
int function(const gsl_vector * x, void *params, gsl_vector * f){
double target1 = ((struct rparams *) params)->target1;
double x1 = gsl_vector_get(x, 0);
/* Run simulation here using x1 parameter */
/* Assign output to temp1, which I am trying to match to target1 */
const double y1 = temp1 - target1;
gsl_vector_set (f, 0, y1);
return GSL_SUCCESS;
}

Be careful in designing the function you want to obtain the root from. In fact, for a test, I tried a function that had a constant output. This caused the algorithm to throw out the NaNs.

If you only need to find the root of a single equation, you can use the gsl_roots library instead of gsl_multiroots. The gsl_roots library has several bisection algorithms for which you specify a range instead of an initial guess. If you know your root is inside the interval (0, 1), you would set that as the target interval and the algorithm would never go outside that range. A minimal, complete example in C++ demonstrating the bisection method is below. If you can't use C++11 lambda functions, then you'd have to define the objective function like you did in your original question.
#include <iostream>
#include <gsl/gsl_errno.h>
#include <gsl/gsl_roots.h>
using namespace std;
int
main (void)
{
//Set the solver type (bisection method)
gsl_root_fsolver* s = gsl_root_fsolver_alloc(gsl_root_fsolver_bisection);
//Use a lambda to define the objective function.
//This is a parabola with the equation: y = (x-1)^2 - 1
//It has roots at x = 0 and x = 2.
gsl_function F;
F.function = [](double x, void*){return ((x-1) * (x-1)) - 1;};
//Initialize the solver; make a guess that the root is between x = 0.5 and x = 10
gsl_root_fsolver_set(s, &F, 0.5, 10.0);
//Run the solver until the root is found to within 0.001
int status;
do {
gsl_root_fsolver_iterate(s);
double r = gsl_root_fsolver_root(s);
double x_low = gsl_root_fsolver_x_lower(s);
double x_high = gsl_root_fsolver_x_upper(s);
status = gsl_root_test_interval(x_low, x_high, 0, 0.001);
if (status == GSL_SUCCESS)
cout << "Converged" << endl;
cout << "x_low = " << x_low;
cout << "; x_high = " << x_high;
cout << "; root = " << r << endl;
}
while (status == GSL_CONTINUE);
return status;
}

Related

Wrong sign in numerical integration, possible precision issue

I need to integrate the following function:
where z > 0. The problem is that the integrand is very small for large z and high precision is required in the integration. So far, I have written the integrand as
double integrand__W(double x, double z){
double arg = z*z/(4.0*x);
double num = exp(arg+x)+1;
double den1 = expm1(arg);
double den2 = exp(x);
num = isinf(num) ? arg+x : log(num);
den1 = isinf(den1) ? arg : log(den1);
den2 = x; //log(exp(x))=x
double t1 = num-den1-den2;
num = exp(x);
double den = exp(x)+1;
double t2 = isinf(den) ? exp(-x) : num/(den*den);
return t1*t2;
}
For numerical integration, I'm using Cubature, a simple C-package for adaptive multidimensional integration:
//integrator
struct fparams {
double z;
};
int inf_W(unsigned ndim, const double *x, void *fdata, unsigned fdim, double *fval){
struct fparams * fp = (struct fparams *)fdata;
double z = fp->z;
double t = x[0];
double aux = integrand__W(a_int+t*pow(1.0-t, -1.0), z)*pow(1.0-t, -2.0);
if (!isnan(aux) && !isinf(aux))
{
fval[0] = aux;
}
else
{
fval[0] = 0.0;
}
return 0;
}
//range integration 1D
size_t maxEval = 1e7;
double xl[1] = { 0 };
double xu[1] = { 1 };
double W, W_ERR;
struct fparams params = {z};
hcubature(1, inf_W, &params, 1, xl, xu, maxEval, 0, 1e-5, ERROR_INDIVIDUAL, &W, &W_ERR);
cout << "z: " << z << " | " << W << " , " << W_ERR << endl;
where the integration over the semi-infinite interval is possible by a change of variables:
Analytically, I know that the integrated is non-negative, so the integral itself should be non-negative. However, I'm getting some incorrect results due to a lack of accuracy:
z: 100 | -3.97632e-17 , 1.24182e-16
In Mathematica, working with high precision, I can get the desired result:
w[x_, z_] := E^x/(E^x + 1)^2 Log[(E^(z^2/(4 x)) + E^-x)/(E^(z^2/(4 x)) - 1)]
W[z_?NumericQ] := NIntegrate[w[x, z], {x, 0, ∞},
WorkingPrecision -> 40,
Method -> "LocalAdaptive"]
W[100]
(* 4.679853458969239635780655689865016458810*10^-43 *)
My question: Is there any way to write my integrand such that I can reach the required precision? Thanks.
There are integration schemes which only use positive weights, resulting in nonnegative integral values if the evaluated function values of the integrand are all nonnegative. Some other integration schemes permit negative weights, resulting in a possibly higher accuracy for integration. Cubature probably uses one of those.
Your actual integral value is very close to 0 for z=100, and that's what you're getting, too, so there's really nothing wrong with the integration scheme. If you absolutely need nonnegativity, one option is to simply set the negative results to 0.
After asking the same question to a different community, I got two suggestions that seem to work:
Avoiding subtractive cancellation
Manipulate the integral a little bit first:
and then rewrite the integrand as
double integrand__W(double x, double z){
double arg = z*z/(4.0*x);
double t1 = log1p((exp(-x)+1)/expm1(arg));
double num = exp(x);
double den = exp(x)+1;
double t2 = isinf(den) ? exp(-x) : num/(den*den);
return t1*t2;
}
Use of Exp-Sinh quadrature
This integration scheme is provided by the Boost library:
#include <iostream>
#include <cmath>
#include <boost/math/quadrature/exp_sinh.hpp>
using boost::math::quadrature::exp_sinh;
using std::exp;
using std::expm1;
using std::log;
int main() {
exp_sinh<double> integrator;
double z = 100.0;
auto f = [z](double x) {
double k1 = 1.0/(2 + exp(-x) +exp(x));
double t = z*z/(4*x);
double log_arg;
if (t > 1) {
log_arg = (1 + exp(-x)*exp(-t))/(1 - exp(-t));
} else {
log_arg = (exp(t) + exp(-x))/expm1(t);
}
return k1*log(log_arg);
};
double termination = sqrt(std::numeric_limits<double>::epsilon());
double error;
double L1;
double Q = integrator.integrate(f, termination, &error, &L1);
std::cout << "Q = " << Q << ", error estimate: " << error << "\n";
}
I can't say much about the mathematics (i have a love/hate relationship with math) but higher precision can be achieved via long double and the related math functions in the standard math library.
But long double does not necessarily mean higher precision, dependend on your compiler and system architecture it could be simply a double or 80 bit extended precision or more.
More info:
https://en.wikipedia.org/wiki/Long_double
https://en.wikipedia.org/wiki/Extended_precision

When calculating adjoint sensitivities in SUNDIALS/CVODES, how does one handle discontinuities in the forward solution?

I am using CVODES to calculate adjoint sensitivities for a very basic equation with solution discontinuity at t=1. When integrating the forward solution, I integrate over each interval with CVodeF() in CV_ONE_STEP mode in a loop, calling CVodeReInit() to restart the integration at the discontinuity. This yields the correct forward solution. I then call CVodeB() for the backward integration for an adjoint sensitivity analysis.
My question regards forward integration restarts and how to handle them during the backward integration. At the beginning of my program, I call CVodeAdjInit() with Nd = 1, so I believe I am saving checkpoints at every integration step. However, when examining the checkpoints and also trying to recall the forward solution (with CVodeGetAdjY()) at and around t=1, I don't see the correct jump discontinuity at the correct time.
Are these restarts explicitly saved (in the checkpointing scheme or elsewhere)? Are there additional CVODES functions I should call to inform the backward integrator of these restarts? Any general guidance with using CVODES for an adjoint sensitivity analysis (in the presence of forward solution discontinuities) would be much appreciated.
I've included example code that illustrates this. We integrate dy/dt = -0.05*y from t = 0 to t = 2, with an impulse of 0.1 applied at t = 1. In this example, I am not doing anything with the adjoint state lambda - the main purpose is to illustrate how the forward solution y is recalled during the backward integration.
#include <stdio.h>
#include <cvodes/cvodes.h>
#include <nvector/nvector_serial.h> /* access to serial N_Vector */
#include <sunmatrix/sunmatrix_dense.h> /* access to dense SUNMatrix */
#include <sunlinsol/sunlinsol_dense.h> /* access to dense SUNLinearSolver */
/* Number of equations in system (1 for this example) */
#define NEQ 1
/* Accessor macros */
#define Ith(v, i) NV_Ith_S(v,i) /* i-th vector component, i=0..NEQ-1 */
#define IJth(A,i,j) SM_ELEMENT_D(A,i,j) /* IJth numbers rows,cols 0..NEQ-1 */
/* Decay rate*/
#define R 0.05;
static int rhs(realtype t, N_Vector y, N_Vector ydot, void *user_data);
static int Jac(realtype t, N_Vector y, N_Vector fy, SUNMatrix J,
void *user_data, N_Vector tmp1, N_Vector tmp2, N_Vector tmp3);
static int rhs_adj(realtype t, N_Vector y, N_Vector lam, N_Vector lamdot, void *user_data);
static int Jac_adj(realtype t,
N_Vector y, N_Vector lam, N_Vector fyB,
SUNMatrix JB, void *user_data,
N_Vector tmp1B, N_Vector tmp2B, N_Vector tmp3B);
int main() {
uint32_t maxord = 5; //BDF order
double abstol = 1e-8;
double reltol = 1e-8;
double abstol_adj = 1e-8;
double reltol_adj = 1e-8;
int adj_steps = 1; //number of integration steps between two consecutive checkpoints
int ncheck; //number of checkpoints
int indexB; //index for the backward problem
/* impulse at t=1*/
double impulse_time = 1.0;
double impulse_amount = 0.1;
/* integrate to t=2 */
double final_time = 2.0;
/* y = state */
N_Vector y = N_VNew_Serial(NEQ);
Ith(y, 0) = 1.0; //init condit.
/* lambda in adjoint equation. Needed for the backward integration, though will be ignored for this example */
N_Vector lam = N_VNew_Serial(NEQ);
Ith(lam, 0) = 0.0; //init condit.
/* initialize cvodes, set tolerances, etc. */
void *cvode_mem = CVodeCreate(CV_BDF);
CVodeSetMaxOrd(cvode_mem, maxord);
CVodeInit(cvode_mem, rhs, 0.0, y);
CVodeSStolerances(cvode_mem, reltol, abstol);
/* Create SUNMatrix and linear solver for the forward problem and set Jacobian fcn*/
SUNMatrix A = SUNDenseMatrix(NEQ, NEQ);
SUNLinearSolver LS = SUNLinSol_Dense(y, A);
CVodeSetLinearSolver(cvode_mem, LS, A);
CVodeSetJacFn(cvode_mem, Jac);
/* Inform the forward problem there will be an adjoint problem to solve as well */
CVodeAdjInit(cvode_mem, adj_steps, CV_HERMITE);
/* Initialization steps for adj. problem*/
CVodeCreateB(cvode_mem, CV_BDF, &indexB);
CVodeSetMaxOrdB(cvode_mem, indexB, maxord);
CVodeInitB(cvode_mem, indexB, rhs_adj, final_time, lam);
CVodeSStolerancesB(cvode_mem, indexB, reltol_adj, abstol_adj);
/* Create SUNMatrix and linear solver for the adj problem and attach */
SUNMatrix AB = SUNDenseMatrix(NEQ, NEQ);
SUNLinearSolver LSB = SUNLinSol_Dense(y, AB);
CVodeSetLinearSolverB(cvode_mem, indexB, LSB, AB);
CVodeSetJacFnB(cvode_mem, indexB, Jac_adj);
/* The forward integration */
realtype time; // updated by each integration step by CVodeF
double current_time = 0.0;
double goal_time = impulse_time; // we will first integrate to the time of the impulse
int impulse_applied = 0;
int init_needed = 0;
while (current_time < final_time) {
/* need to re-initialize cvodes after the jump */
if (init_needed) {
CVodeReInit(cvode_mem, current_time, y);
printf("Re-init after impulse\n");
init_needed = 0;
}
while (current_time < goal_time) {
/* main forward integration step */
CVodeF(cvode_mem, goal_time, y, &time, CV_ONE_STEP, &ncheck);
current_time = time;
printf("t = %10.8f, y = %10.8f | ncheck = %d\n", current_time, Ith(y, 0), ncheck);
}
/* apply impulse */
if (!impulse_applied && impulse_time <= current_time) {
current_time = impulse_time;
CVodeGetDky(cvode_mem, impulse_time, 0, y);
printf("****** Before impulse: t = %10.8f, y = %10.8f \n", current_time, Ith(y, 0));
Ith(y, 0) += impulse_amount;
printf("****** After impulse: t = %10.8f, y = %10.8f \n", current_time, Ith(y, 0));
init_needed = 1;
impulse_applied = 1;
goal_time = final_time;
}
}
/* Now, integrate backwards in time for the adjoint problem */
goal_time = 1.0;
printf("\nPerforming backward integration ...\n");
while (current_time > goal_time) {
/* main backward integration step */
CVodeB(cvode_mem, goal_time, CV_ONE_STEP);
/* need to call CVodeGetB to get the time of the last CVodeB step */
CVodeGetB(cvode_mem, indexB, &time, lam);
/* obtain interpolated forward solution at current time as well */
CVodeGetAdjY(cvode_mem, time, y);
printf("t = %10.8f, y = %10.8f \n", time, Ith(y, 0));
current_time = time;
}
printf("Around impulse: \n");
double times[5] = {1.002, 1.001, 1.0, 0.999, 0.998};
for (int i = 0; i < 5; i++){
CVodeGetAdjY(cvode_mem, times[i], y);
printf("t = %10.8f, y = %10.8f \n", times[i], Ith(y, 0));
}
/* cleanup */
N_VDestroy(y);
N_VDestroy(lam);
CVodeFree(&cvode_mem);
SUNLinSolFree(LS);
SUNLinSolFree(LSB);
SUNMatDestroy(A);
SUNMatDestroy(AB);
return 0;
}
static int rhs(realtype t, N_Vector y, N_Vector ydot, void *user_data) {
/* exp decay */
double r = R;
Ith(ydot, 0) = -r * Ith(y, 0);
return (0);
}
static int Jac(realtype t, N_Vector y, N_Vector fy, SUNMatrix J,
void *user_data, N_Vector tmp1, N_Vector tmp2, N_Vector tmp3) {
/* Jacobian of rhs */
double r = R;
IJth(J, 0, 0) = -r;
return (0);
}
static int rhs_adj(realtype t, N_Vector y, N_Vector lam, N_Vector lamdot, void *user_data) {
/* RHS of adjoint problem
* Note: the adjoint problem is lam' = -(J^*)lam - g_y^*, where J is the Jacobian matrix of the main problem.
* For this example, we take g = 0 everywhere */
double r = R;
Ith(lamdot, 0) = r * Ith(lam, 0);
return (0);
}
static int Jac_adj(realtype t,
N_Vector y, N_Vector lam, N_Vector fyB,
SUNMatrix JB, void *user_data,
N_Vector tmp1B, N_Vector tmp2B, N_Vector tmp3B) {
/* Jacobian of adjoint problem */
double r = R;
IJth(JB, 0, 0) = r;
return (0);
}
During the forward integration, we have the output (around t = 1)
...
t = 0.77864484, y = 0.96181582 | ncheck = 18
t = 0.94641606, y = 0.95378132 | ncheck = 19
t = 1.11418727, y = 0.94581393 | ncheck = 20
****** Before impulse: t = 1.00000000, y = 0.95122937
****** After impulse: t = 1.00000000, y = 1.05122937
Re-init after impulse
t = 1.00197548, y = 1.05112555 | ncheck = 21
t = 1.00395097, y = 1.05102173 | ncheck = 22
...
During the backward phase (here, y is obtained via CVodeGetAdjY),
...
t = 1.00934328, y = 1.05073841
t = 1.00395097, y = 1.05102173
t = 1.00197548, y = 1.05112555
t = 0.98222065, y = 0.95207536
The recalled y value at t = 1.00197548 is correct at that time (this is the first step after the impulse taken during the forward integration), but if I then query y at times near the impulse (again with CVodeGetAdjY):
Around impulse:
t = 1.00200000, y = 0.95113425
t = 1.00100000, y = 0.95118181
t = 1.00000000, y = 0.95122937
t = 0.99900000, y = 0.95127693
t = 0.99800000, y = 0.95132450
the impulse is not apparent. The recalled y at t = 1.0 is the pre-impulse value. It appears as though, even though CVodeReInit() is called immediately after the impulse during the forward integration, the post-impulse y is not "seen" during the backward integration. Moreover, if the backward integrator had required any steps between checkpoints, the interpolated y between 1.00197548 and t = 1.0 would likely be off.
In other words, my question is: After a re-init of the forward problem, is there a way to ensure that such a restart is saved and accessible from the checkpoint data?

Golden Section Method in C

I am pretty new to coding and I have been having an impossible time trying to find online help writing a C code that will use the golden section method (which apparently the GNU Scientific Library has, although I haven't had any luck finding it) to find the minimum of functions that Newton's method of minimization fails for.
Specifically I want to input an x-value as a starting point and have the code output the function's minimum value and the x coordinate of the minimum value point. My function is f(x) = x20. I am also allowed some error (< 10-3).
I don't even know where to begin with this, I have been ALL over the internet and haven't found anything helpful. I would seriously appreciate some help as to where I might find more information, or how I might implement this method.
Edit:
This is my code as of now:
#include <gsl/gsl_errno.h> /* Defines GSL_SUCCESS, etc. */
#include <gsl/gsl_math.h>
#include <gsl/gsl_min.h>
int minimize_convex(gsl_function *F,double a, double b, double *x_min, double tol)
{
int status;
double h = (b - a) * .0000001; /* Used to test slope at boundaries */
/* First deal with the special cases */
if (b - a < tol)
{
*x_min = b;
status = GSL_SUCCESS;
}
/* If the min is at a, then the derivative at a is >= 0. Test for
* this case. */
else if (GSL_FN_EVAL(F, a + h) - GSL_FN_EVAL(F, a) >= 0)
{
*x_min = a;
status = GSL_SUCCESS;
}
/* If the min is at b, then the derivative at b is >= 0. Test for
* this case. */
else if (GSL_FN_EVAL(F, b - h) - GSL_FN_EVAL(F, b) >= 0)
{
*x_min = b;
status = GSL_SUCCESS;
}
else
{
/* Choose x_guess so that it's value is less than either of the two
* endpoint values. Since we've got this far, we know that at least
* of of F(a + h) and F(b - h) has this property. */
double x_guess;
x_guess = (GSL_FN_EVAL(F, a + h) < GSL_FN_EVAL(F, b - h)) ?
a + h : b - h;
int iter = 0, max_iter = 200;
const gsl_min_fminimizer_type *T;
gsl_min_fminimizer *s;
T = gsl_min_fminimizer_goldensection;
s = gsl_min_fminimizer_alloc(T);
gsl_min_fminimizer_set(s, F, x_guess, a, b);
do
{
iter++;
status = gsl_min_fminimizer_iterate(s); /* perform iteration */
status =
gsl_min_test_interval(a, b, tol, 0.0); /* |a - b| < tol? */
a = gsl_min_fminimizer_x_lower(s);
b = gsl_min_fminimizer_x_upper(s);
if (status == GSL_SUCCESS)
{
*x_min = gsl_min_fminimizer_x_minimum(s); /* current est */
}
}
while (status == GSL_CONTINUE && iter < max_iter);
gsl_min_fminimizer_free(s);
}
return status;
}
double f(double x, void *params)
{
double *p = (double *) params;
return (x^(50)) + *p;
}
double C = 0.0;
int main (void)
{
double m = 0.0, result;
double a = -1.0, b = 1.0;
double epsilon = 0.001;
int exit_val;
gsl_function F;
F.function = &f;
F.params = &C;
exit_val = minimize_convex(&F, a, b, m, &result, epsilon);
printf("Minimizer: %g\n", result);
printf("Function value: %g\n", f(result, &C));
printf("%d\n", exit_val);
return 0;
}
I am getting the following errors:
try.c:69:14: error: invalid operands to binary
expression ('double' and 'double')
return (x^(50)) + *p;
try.c:81:54: error: too many arguments to function
call, expected 5, have 6
exit_val = minimize_convex(&F, a, b, m, &result, epsilon);
Any thoughts?
gsl has a generic minimizer that can use multiple methods to acheive the minimization. The description of how to use the minimizer can be found in the documentation. You can set it to the golden section method by delcaring the method as gsl_min_fminimizer_goldensection.

How to implement nested loops in cuda thrust

I currently have to run a nested loop as follow:
for(int i = 0; i < N; i++){
for(int j = i+1; j <= N; j++){
compute(...)//some calculation here
}
}
I've tried leaving the first loop in CPU and do the second loop in GPU. Results are too many memory access. Is there any other ways to do it? For example by thrust::reduce_by_key?
The whole program is here:
#include <thrust/device_vector.h>
#include <thrust/host_vector.h>
#include <thrust/generate.h>
#include <thrust/sort.h>
#include <thrust/binary_search.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/random.h>
#include <cmath>
#include <iostream>
#include <iomanip>
#define N 1000000
// define a 2d point pair
typedef thrust::tuple<float, float> Point;
// return a random Point in [0,1)^2
Point make_point(void)
{
static thrust::default_random_engine rng(12345);
static thrust::uniform_real_distribution<float> dist(0.0f, 1.0f);
float x = dist(rng);
float y = dist(rng);
return Point(x,y);
}
struct sqrt_dis: public thrust::unary_function<Point, double>
{
float x, y;
double tmp;
sqrt_dis(float _x, float _y): x(_x), y(_y){}
__host__ __device__
float operator()(Point a)
{
tmp =(thrust::get<0>(a)-x)*(thrust::get<0>(a)-x)+\
(thrust::get<1>(a)-y)*(thrust::get<1>(a)-y);
tmp = -1.0*(sqrt(tmp));
return (1.0/tmp);
}
};
int main(void) {
clock_t t1, t2;
double result;
t1 = clock();
// allocate some random points in the unit square on the host
thrust::host_vector<Point> h_points(N);
thrust::generate(h_points.begin(), h_points.end(), make_point);
// transfer to device
thrust::device_vector<Point> points = h_points;
thrust::plus<double> binary_op;
float init = 0;
for(int i = 0; i < N; i++){
Point tmp_i = points[i];
float x = thrust::get<0>(tmp_i);
float y = thrust::get<1>(tmp_i);
result += thrust::transform_reduce(points.begin()+i,\
points.end(),sqrt_dis(x,y),\
init,binary_op);
std::cout<<"result"<<i<<": "<<result<<std::endl;
}
t2 = clock()-t1;
std::cout<<"result: ";
std::cout.precision(10);
std::cout<< result <<std::endl;
std::cout<<"run time: "<<t2/CLOCKS_PER_SEC<<"s"<<std::endl;
return 0;
}
EDIT: Now that you have posted an example, here is how you could solve it:
You have n 2D points stored in a linear array like this (here n=4)
points = [p0 p1 p2 p3]
Based on your code I assume you want to calculate:
result = f(p0, p1) + f(p0, p2) + f(p0, p3) +
f(p1, p2) + f(p1, p3) +
f(p2, p3)
Where f() is your distance function which needs to be executed m times in total:
m = (n-1)*n/2
in this example: m=6
You can look at this problem as a triangular matrix:
[ p0 p1 p2 p3 ]
[ p1 p2 p3 ]
[ p2 p3 ]
[ p3 ]
Transforming this matrix into a linear vector with m elements while leaving out the diagonal elements results in:
[p1 p2 p3 p2 p3 p3]
The index of an element in the vector is k = [0,m-1].
Index k can be remapped to columns and rows of the triangular matrix to k -> (i,j):
i = n - 2 - floor(sqrt(-8*k + 4*n*(n-1)-7)/2.0 - 0.5)
j = k + i + 1 - n*(n-1)/2 + (n-i)*((n-i)-1)/2
i is the row and j is the column.
In our example:
0 -> (0, 1)
1 -> (0, 2)
2 -> (0, 3)
3 -> (1, 2)
4 -> (1, 3)
5 -> (2, 3)
Now you can put all this together and execute a modified distance functor m times which applies the aforementioned mapping to get the corresponding pairs based on the index and then sum up everything.
I modified your code accordingly:
#include <thrust/device_vector.h>
#include <thrust/generate.h>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/transform_reduce.h>
#include <thrust/random.h>
#include <math.h>
#include <iostream>
#include <stdio.h>
#include <stdint.h>
#define PRINT_DEBUG
typedef float Float;
// define a 2d point pair
typedef thrust::tuple<Float, Float> Point;
// return a random Point in [0,1)^2
Point make_point(void)
{
static thrust::default_random_engine rng(12345);
static thrust::uniform_real_distribution<Float> dist(0.0, 1.0);
Float x = dist(rng);
Float y = dist(rng);
return Point(x,y);
}
struct sqrt_dis_new
{
typedef thrust::device_ptr<Point> DevPtr;
DevPtr points;
const uint64_t n;
__host__
sqrt_dis_new(uint64_t n, DevPtr p) : n(n), points(p)
{
}
__device__
Float operator()(uint64_t k) const
{
// calculate indices in triangular matrix
const uint64_t i = n - 2 - floor(sqrt((double)(-8*k + 4*n*(n-1)-7))/2.0 - 0.5);
const uint64_t j = k + i + 1 - n*(n-1)/2 + (n-i)*((n-i)-1)/2;
#ifdef PRINT_DEBUG
printf("%llu -> (%llu, %llu)\n", k,i,j);
#endif
const Point& p1 = *(points.get()+j);
const Point& p2 = *(points.get()+i);
const Float xm = thrust::get<0>(p1)-thrust::get<0>(p2);
const Float ym = thrust::get<1>(p1)-thrust::get<1>(p2);
return 1.0/(-1.0 * sqrt(xm*xm + ym*ym));
}
};
int main()
{
const uint64_t N = 4;
// allocate some random points in the unit square on the host
thrust::host_vector<Point> h_points(N);
thrust::generate(h_points.begin(), h_points.end(), make_point);
// transfer to device
thrust::device_vector<Point> d_points = h_points;
const uint64_t count = (N-1)*N/2;
std::cout << count << std::endl;
thrust::plus<Float> binary_op;
const Float init = 0.0;
Float result = thrust::transform_reduce(thrust::make_counting_iterator((uint64_t)0),
thrust::make_counting_iterator(count),
sqrt_dis_new(N, d_points.data()),
init,
binary_op);
std::cout.precision(10);
std::cout<<"result: " << result << std::endl;
return 0;
}
It depends on your compute function which you do not specify.
Usually you unroll the loops and launch the kernel in a 2D manner for every combination of i and j if the computations are independent.
Have a look at the Thrust examples and identify similar use cases to your problem.

R freezes when I call a C code

I wrote a small C code to do random walk metropolis, which I call in R. When I run it, R freezes. I am not sure which part of the code is incorrect. I following this Peng and Leeuw tutorial (on Page 6). As a disclaimer: I don't have much experience with C, and have only some basic knowledge of C++
#----C code --------
#include <R.h>
#include <Rmath.h>
void mcmc(int *niter, double *mean, double *sd, double *lo_bound,
double *hi_bound, double *normal)
{
int i, j;
double x, x1, h, p;
x = runif(-5, 5);
for(i=0; i < *niter; i++) {
x1 = runif(*lo_bound, *hi_bound);
while((x1 + x) > 5 || (x1 + x) < -5)
x1 = runif(*lo_bound, *hi_bound);
h = dnorm(x+x1, *mean, *sd, 0)/dnorm(x, *mean, *sd, 0);
if(h >= 1)
h = 1;
p = runif(0, 1);
if(p < h)
x += x1;
normal[i] = x;
}
}
#-----R code ---------
foo_C<-function(mean, sd, lo_bound, hi_bound, niter)
{
result <- .C("mcmc", as.integer(niter), as.double(mean), as.double(sd),
as.double(lo_bound), as.double(hi_bound), normal=double(niter))
result[["normal"]]
}
After compiling it:
dyn.load("foo_C.so")
foo_C(0, 1, -0.5, 0.5, 100)
FOLLOW UP:
The while loop is where the problem lies. But the root of the problem seems to have to do with the function runif, which is supposed to generate a random variable between a lower bound and an upper bound. But it seems that what the function actually does is to randomly pick either the upper bound value (5) or the lower bound value (-5).
You need to follow the instructions in Writing R Extensions, section 6.3 Random number generation and call GetRNGstate(); before you call R's random number generation routines. You also need to call PutRNGstate(); when you're finished.
The reason your code started working is likely because you called set.seed in the R session before you called your mcmc C function.
So your C code should look like this:
#include <R.h>
#include <Rmath.h>
void mcmc(int *niter, double *mean, double *sd, double *lo_bound,
double *hi_bound, double *normal)
{
int i;
double x, x1, h, p;
GetRNGstate();
x = runif(-5.0, 5.0);
for(i=0; i < *niter; i++) {
x1 = runif(*lo_bound, *hi_bound);
while((x1 + x) > 5.0 || (x1 + x) < -5.0) {
x1 = runif(*lo_bound, *hi_bound);
//R_CheckUserInterrupt();
}
h = dnorm(x+x1, *mean, *sd, 0)/dnorm(x, *mean, *sd, 0);
if(h >= 1)
h = 1;
p = runif(0, 1);
if(p < h)
x += x1;
normal[i] = x;
}
PutRNGstate();
}

Resources