calculating double integrals in R quickly - c

I'm looking for a solution for a double integral that is faster than
integrate(function(y) {
sapply(y, function(y) {
integrate(function(x) myfun(x,y), llim, ulim)$value
})
}, llim, ulim)
with eg
myfun <- function(x,y) cos(x+y)
llim <- -0.5
ulim <- 0.5
I found an old paper that referred to a FORTRAN program called quad2d, but I couldn't find anything else but some help pages for matlab for the rest. So I'm looking for a C or FORTRAN library that can do double integrals quick (i.e. without the sapply loop), and that can be called from R. All ideas are very much appreciated, as long as they're all GPL compatible.
If the solution involves calling other functions from the libraries that are already shipped with R, I'd love to hear from them as well.

The cubature package does 2D (and N-D) integration using an adaptive algorithm. It should outperform simpler approaches for most integrands.

The pracma package that Joshua pointed out contains a version of quad2d.
quad2d(myfun, llim, ulim, llim, ulim)
This gives the same answer, within numerical tolerance, as your loop, using the example function.
By default, with your example function, quad2d is slower than the loop. If you drop n down, you can make it faster, but I guess it depends upon how smooth your function is, and how much accuracy you are willing to sacrifice for speed.

Related

GCD computation issues with GNU MP Library

I have a question about GNU MP, could you please help me how to proceed with that.
I using "The GNU Multiple Precision Arithmetic Library" Edition 5.1.1 on Windows.
(MinGW\gcc + MSYS)
There is an mpz_gcd function exist to calculate "gcd" of two integers.
void mpz_gcd (mpz_t rop, mpz_t op1, mpz_t op2);
As far as I get from documentation, there are couple of algorithms implemented in GNU MP for computation of Greatest Common Divisor. Among them:
Binary GCD
Lehmer’s algorithm
Subquadratic GCD
The algorithm that is used seems to be selected automatically, based on the input size of integers.
Currently, the binary algorithm is used for GCD only when N < 3.
For inputs larger than GCD_DC_THRESHOLD, GCD is computed via the HGCD (Half GCD) function
as a generalization to Lehmer’s algorithm.
So, I guess there are at least three different approaches to get gcd(a, b).
The main problem for me: I want to specify which algorithm to use by myself.
I would compare time execution of these algorithms on the random large input (i.e. 10^5 bits) to find out some common trends: what is that point where using "Binary GCD" becomes worse than "Lehmer's method", is "HGCD-Lehmer generalization" is really better than straightforward Lehmer, etc.
Is there any simple way to specify the algorithm you want to use? Any way to pull out this algorithm from the library, any way to modify some "#define" variables. Is it possible to do something like I want without library recompilation? I'm just beginner there and I don't feel able to figure out all sort of things inside the library.
P.S. Probably somebody would be interested what's will come out of that.
I've got some code on the github: https://github.com/int000h/gcd_gcc
This is a good time to be reading source code. GMP is open source—take advantage of that!
In mpn/generic/gcd.c you'll find the function which selects the GCD algorithm (this is actually a public function, it appears in the documentation):
mp_size_t
mpn_gcd (mp_ptr gp, mp_ptr up, mp_size_t usize, mp_ptr vp, mp_size_t n)
{
...
if (ABOVE_THRESHOLD (n, GCD_DC_THRESHOLD)) {
...
You can see that there are three main branches to the function, each ending with a return statement. Each branch corresponds to a different GCD algorithm. You can copy and paste the code into your own application and modify it so you can specify exactly which algorithm you want. Tips:
You can get rid of the #ifdefs. Assume TUNE_GCD_P is not defined.
This is an mpn_* function instead of an mpz_* function. It's lower-level: you have to explicitly allocate space for outputs, for example. You may also wish to copy the code from the higher-level function, mpz_gcd().
You'll need to extract prototypes for the internal functions, like mpn_hgcd_matrix_adjust(). Just copy the prototypes out of the GMP source code. Don't worry, internal functions appear to be exported from the shared library (they generally shouldn't be, but they are, so you're fine).
No need to recompile the library, but you will need to do a little bit of work here.

how to incorporate C or C++ code into my R code to speed up a MCMC program, using a Metropolis-Hastings algorithm

I am seeking advice on how to incorporate C or C++ code into my R code to speed up a MCMC program, using a Metropolis-Hastings algorithm. I am using an MCMC approach to model the likelihood, given various covariates, that an individual will be assigned a particular rank in a social status hierarchy by a 3rd party (the judge): each judge (approx 80, across 4 villages) was asked to rank a group of individuals (approx 80, across 4 villages) based on their assessment of each individual's social status. Therefore, for each judge I have a vector of ranks corresponding to their judgement of each individual's position in the hierarchy.
To model this I assume that, when assigning ranks, judges are basing their decisions on the relative value of some latent measure of an individual's utility, u. Given this, it can then be assumed that a vector of ranks, r, produced by a given judge is a function of an unobserved vector, u, describing the utility of the individuals being ranked, where the individual with the kth highest value of u will be assigned the kth rank. I model u, using the covariates of interest, as a multivariate normally distributed variable and then determine the likelihood of the observed ranks, given the distribution of u generated by the model.
In addition to estimating the effect of, at most, 5 covariates, I also estimate hyperparameters describing variance between judges and items. Therefore, for every iteration of the chain I estimate a multivariate normal density approximately 8-10 times. As a result, 5000 iterations can take up to 14 hours. Obviously, I need to run it for much more than 5000 runs and so I need a means for dramatically speeding up the process. Given this, my questions are as follows:
(i) Am I right to assume that the best speed gains will be had by running some, if not all of my chain in C or C++?
(ii) assuming the answer to question 1 is yes, how do I go about this? For example, is there a way for me to retain all my R functions, but simply do the looping in C or C++: i.e. can I call my R functions from C and then do looping?
(iii) I guess what I really want to know is how best to approach the incorporation of C or C++ code into my program.
First make sure your slow R version is correct. Debugging R code might be easier than debugging C code. Done that? Great. You now have correct code you can compare against.
Next, find out what is taking the time. Use Rprof to run your code and see what is taking the time. I did this for some code I inherited once, and discovered it was spending 90% of the time in the t() function. This was because the programmer had a matrix, A, and was doing t(A) in a zillion places. I did one tA=t(A) at the start, and replaced every t(A) with tA. Massive speedup for no effort. Profile your code first.
Now, you've found your bottleneck. Is it code you can speed up in R? Is it a loop that you can vectorise? Do that. Check your results against your gold standard correct code. Always. Yes, I know its hard to compare algorithms that rely on random numbers, so set the seeds the same and try again.
Still not fast enough? Okay, now maybe you need to rewrite parts (the lowest level parts, generally, and those that were taking the most time in the profiling) in C or C++ or Fortran, or if you are really going for it, in GPU code.
Again, really check the code is giving the same answers as the correct R code. Really check it. If at this stage you find any bugs anywhere in the general method, fix them in what you thought was the correct R code and in your latest version, and rerun all your tests. Build lots of automatic tests. Run them often.
Read up about code refactoring. It's called refactoring because if you tell your boss you are rewriting your code, he or she will say 'why didn't you write it correctly first time?'. If you say you are refactoring your code, they'll say "hmmm... good". THIS ACTUALLY HAPPENS.
As others have said, Rcpp is made of win.
A complete example using R, C++ and Rcpp is provided by this blog post which was inspired by a this post on Darren Wilkinson's blog (and he has more follow-ups). The example is also included with recent releases of Rcpp in a directory RcppGibbs and should get you going.
I have a blog post which discusses exactly this topic which I suggest you take a look at:
http://darrenjw.wordpress.com/2011/07/31/faster-gibbs-sampling-mcmc-from-within-r/
(this post is more relevant than the post of mine that Dirk refers to).
I think the best method currently to integrate C or C++ is the Rcpp package of Dirk Eddelbuettel. You can find a lot of information at his website. There is also a talk at Google that is available through youtube that might be interesting.
Check out this project:
https://github.com/armstrtw/rcppbugs
Also, here is a link to the R/Fin 2012 talk:
https://github.com/downloads/armstrtw/rcppbugs/rcppbugs.pdf
I would suggest to benchmark each step of the MCMC sampler and identify the bottleneck. If you put each full conditional or M-H-step into a function, you can use the R compiler package which might give you 5%-10% speed gain. The next step is to use RCPP.
I think it would be really nice to have a general-purpose RCPP function which generates just one single draw using the M-H algorithm given a likelihood function.
However, with RCPP some things become difficult if you only know the R language: non-standard random distributions (especially truncated ones) and using arrays. You have to think more like a C programmer there.
Multivariate Normal is actually a big issue in R. Dmvnorm is very inefficient and slow. Dmnorm is faster, but it would give me NaNs quicker than dmvnorm in some models.
Neither does take an array of covariance matrices, so it is impossible to vectorize code in many instances. As long as you have a common covariance and means, however, you can vectorize, which is the R-ish strategy to speed up (and which is the oppositve of what you would do in C).

Matrix solving with C (within CUDA)

As part of a larger problem, I need to solve small linear systems (i.e NxN where N ~10) so using the relevant cuda libraries doesn't make any sense in terms of speed.
Unfortunately something that's also unclear is how to go about solving such systems without pulling in the big guns like GSL, EIGEN etc.
Can anyone point me in the direction of a dense matrix solver (Ax=B) in straight C?
For those interested, the basic structure of the generator for this section of code is:
ndarray=some.generator(N,N)
for v in range N:
B[v]=_F(v)*constant
for x in range N:
A[v,x]=-_F(v)*ndarray[x,v]
Unfortunately I have approximately zero knowledge of higher mathematics, so any advice would be appreciated.
UPDATE: I've been working away at this, and have a nearly-solution that runs but isn't working. Anyone lurking is welcome to check out what I've got so far on pastebin.
I'm using Crout Decomposition with Pivoting which seems to be the most general approach. The idea for this test is that every thread does the same work. Boring I know, but the plan is that the matrixcount variable is increased, actual data is put in, and each thread solves the small matrices individually.
Thanks for everyone who's been checking on this.
POST-ANSWER UPDATE: Finished the matrix solving code for CPU and GPU operation, check out my lazy-writeup here
CUDA won't help here, that's true. Matrices like that are just too small for it.
What you do to solve a system of linear equations is LU decomposition:
http://en.wikipedia.org/wiki/LU_decomposition
http://mathworld.wolfram.com/LUDecomposition.html
Or even better a QR decomposition with Householder reflections like in the Gram-Schmidt process.
http://en.wikipedia.org/wiki/QR_decomposition#Computing_the_QR_decomposition
Solving the linear equation becomes easy afterwards, but I'm afraid there always is some "higher mathematics" (linear algebra) involved. That, and there are many (many!) C libraries out there for solving linear equations. Doesn't seem like "big guns" to me.

Trying to use Cumulative Distribution Function in GSL

Hey guys, I'm trying to compute the cumulative distribution function of the standard normal distribution for a formula in C using the GSL (Gnu Statistics Library)
I've installed and included gsl but am having trouble understanding how to use it.
I think the function I need is:
double gsl_ran_lognormal (const gsl_rng * r, double zeta, double sigma)
The formula I have only has one number that I would pass into a cdf function so I'm not quite sure what to do here. (This is probably because of my crappy understanding of statistics)
I would appreciate it anyone could lend me a hand on how to get the cdf using gsl with one input variable.
Documentation only says:
This function returns a random variate from the lognormal distribution. The distribution function is,
p(x) dx = {1 \over x \sqrt{2 \pi \sigma^2} } \exp(-(\ln(x) - \zeta)^2/2 \sigma^2) dx
for x > 0.
Basically, could someone explain what gsl_rng, zeta, and sigma should be?
EDIT: Ok, I think that zeta should be 0 (mu) and sigma should be 1 (std dev) to make it normal? Is that right? What is gsl_rng?
gsl_rng is a pointer to an initialized (and possible custom seeded) random number generator.
See for example http://www.csse.uwa.edu.au/programming/gsl-1.0/gsl-ref_16.html
Tyler,
I hope your problem is solved already. I am not a programming guru myself but I try to help. I think there are several points.
What you need is gsl_cdf_gaussian_P. The other thing (gsl_ran_lognormal) is inappropriate for two reasons.
1)It is a random number generator and not a cumulative distribution. That means it gives you numbers following a particular distribution, rather than a probability, as you need it.
2)It refers to the lognormal distribution, while you want the normal one.
Once you have a normal, cumulative distribution you can put the mean to 0 and the variance to unity to make it standard normal.
I hope this clarifies the situation. If not, I am here again in the morning.
Your function is for generating a random number with a lognormal distribution. If you are looking for the cumulative distribution you need to look in the "Special Functions" section of the GSL manual, section 7.15.

What are a few time-consuming operations in C?

I'm looking to write a quick benchmark program that can be compiled and run on various machines. Rather than using commercially/open-sourceally available options, I'd rather have my own to play around with threading and algorithm optimization techniques.
I have a couple that I use already, which include recursively calculating the nth number of the Fibonacci sequence, and of seeding/rand()ing a few thousand times.
Are there any other algorithms that are relatively simple, but at the same time computationally-intensive (and possibly math-related)?
(Note that these operations will be implemented in the C language.)
The Ackermann function is usually a fun one, but don't give it very large inputs if you want it to finish in your lifetime.
Fractals
(at various resolutions) Some fractal source in C (without opengl)
I know you said you wanted to make your own, but perhaps you could draw upon existing benchmarks for inspiration. The Computer language benchmark game has run many programming languages through a a set of benchmarks. Perhaps you can get some ideas looking at their benchmarks.
Some quick ideas of the top of my head:
Matrix multiplication: mulitplying 2
large matrices is relatively
computationally intensive, though you
will have to take caching into account
Generating prime numbers
Integer factorization
Numerical methods for solving ODEs -
Runge-kutta for example
Inverting big matrices.
You could calc big primes or factorizing integers.
Take a look at the NAS Parallel Benchmarks. These were originally written by NASA in Fortran for supercomputers using MPI (and are still available that way), but there are also C, Java, and OpenMP implementations available now.
Most of these are very computationally intensive, as they're intended to be representative of numerical algorithms used in scientific computing.
Try to calculate thousands or millions pi digits. There are quite a few formulas for that task.
You have some really nice ones in project euler, those are all math related and can be time consuming as you want using higher values.
Finding prime numbers is considered quite time-consuming.
Checkout the benchmarks from the language shootout: http://shootout.alioth.debian.org/
However: benchmarks are only benchmarks and don't necessarily tell you a lot about the real world and can, on the contrary, be misleading.
If you want to try parallelism, do lots of matrix math. The size of your matrix you can use will be limited by memory, but you can do as many iterations as you want.
This will stress the SIMD instructions that modern CPUs come with.
This does a lot of addition:
int c = 0;
for (int n = 0; n < INT_MAX; n++)
for (int m = 0; m < INT_MAX; m++)
c++;
std::cout << c;
You could try a tsort (Turbo Sort) with a very large input set. I understand this to be a common operation.
Heuristics for NP-Complete problems are a fun way to get some CPU intensive code. You could code a "solution" :) for one of Karps NP-Complete problems.

Resources