Related
I found an article, The secret to understanding recursion, which left me very confused. It suggests that it is unnecessary to trace all the invocations of a recursive function. It also says:
A programmer defining a recursive function usually does not think explicitly about the sequence of invocations that results from calling it.
I do not understand this. Can you explain?
In context, he’s saying that, as long as each step gets you closer to the base case, you’re going to get there, and you don’t need to do a step-by-step walkthrough of the algorithm to realize that.
You might have seen factorials as an example? If you know that the n! = n×(n-1)! step is correct, and you know that the 1! = 1 step is correct, you don’t need to do all the arithmetic to get from 10! = 10×9! down to 10! = 10×9×8 ... in order to verify the algorithm. Since each step is correct, and n gets smaller each time, you’re going to get to the base case and you can prove it just from that.
In order for a programmer to define a recursive function, usually at some point in time prior to defining a recursive function, a programmer would usually want to understand how recursion works, but the article seems focused on how to define a recursive function, as opposed to understanding how recursion works, despite the articles title. So in my opinion, the articles title is a bit mis-leading.
As already commented, in order to understand recursion, usually a simple recursive function is used as a learning example (factorial, Fibonacci, ...). A programmer doesn't need to trace through every level, but may consider what happens a few levels just above the base case, and also the initial case and down one or two levels.
Once recursion is understood, then defining a function just needs to follow the rules mentioned in the article.
I use R and I implemented a Monte Carlo simulation in R which takes long time because of the for loops. Then I realized that I can do the for loops in C, using R API. So I generate my vectors, matrices in R then I call functions from C(which will do the for loops) and finally I present my results in R. However, I only know the basics of C and I cannot figure how to transform some functions to C. For instance I start with a function in R like this:
t=sample(1:(P*Q), size=1)
How can I do this in C? Also I have an expression in R:
A.q=phi[,which(q==1)]
How can I use "which" expression in C?
Before you start writing C code, you would be better off rewriting your R code to make it run faster. sample is vectorised. Can you move the call to it outside the loop? That should speed things up. Even better, can you get rid of the loop entirely?
Also, you don't need to use which when you are indexing. R accepts logical vectors as indicies. Compare:
A.q=phi[,which(q==1)]
A.q=phi[,q==1]
Finally, I recommend not calling your variables t or q since there are functions with those names. Try giving your variables descriptive names instead - it will make your code more readable.
I am seeking advice on how to incorporate C or C++ code into my R code to speed up a MCMC program, using a Metropolis-Hastings algorithm. I am using an MCMC approach to model the likelihood, given various covariates, that an individual will be assigned a particular rank in a social status hierarchy by a 3rd party (the judge): each judge (approx 80, across 4 villages) was asked to rank a group of individuals (approx 80, across 4 villages) based on their assessment of each individual's social status. Therefore, for each judge I have a vector of ranks corresponding to their judgement of each individual's position in the hierarchy.
To model this I assume that, when assigning ranks, judges are basing their decisions on the relative value of some latent measure of an individual's utility, u. Given this, it can then be assumed that a vector of ranks, r, produced by a given judge is a function of an unobserved vector, u, describing the utility of the individuals being ranked, where the individual with the kth highest value of u will be assigned the kth rank. I model u, using the covariates of interest, as a multivariate normally distributed variable and then determine the likelihood of the observed ranks, given the distribution of u generated by the model.
In addition to estimating the effect of, at most, 5 covariates, I also estimate hyperparameters describing variance between judges and items. Therefore, for every iteration of the chain I estimate a multivariate normal density approximately 8-10 times. As a result, 5000 iterations can take up to 14 hours. Obviously, I need to run it for much more than 5000 runs and so I need a means for dramatically speeding up the process. Given this, my questions are as follows:
(i) Am I right to assume that the best speed gains will be had by running some, if not all of my chain in C or C++?
(ii) assuming the answer to question 1 is yes, how do I go about this? For example, is there a way for me to retain all my R functions, but simply do the looping in C or C++: i.e. can I call my R functions from C and then do looping?
(iii) I guess what I really want to know is how best to approach the incorporation of C or C++ code into my program.
First make sure your slow R version is correct. Debugging R code might be easier than debugging C code. Done that? Great. You now have correct code you can compare against.
Next, find out what is taking the time. Use Rprof to run your code and see what is taking the time. I did this for some code I inherited once, and discovered it was spending 90% of the time in the t() function. This was because the programmer had a matrix, A, and was doing t(A) in a zillion places. I did one tA=t(A) at the start, and replaced every t(A) with tA. Massive speedup for no effort. Profile your code first.
Now, you've found your bottleneck. Is it code you can speed up in R? Is it a loop that you can vectorise? Do that. Check your results against your gold standard correct code. Always. Yes, I know its hard to compare algorithms that rely on random numbers, so set the seeds the same and try again.
Still not fast enough? Okay, now maybe you need to rewrite parts (the lowest level parts, generally, and those that were taking the most time in the profiling) in C or C++ or Fortran, or if you are really going for it, in GPU code.
Again, really check the code is giving the same answers as the correct R code. Really check it. If at this stage you find any bugs anywhere in the general method, fix them in what you thought was the correct R code and in your latest version, and rerun all your tests. Build lots of automatic tests. Run them often.
Read up about code refactoring. It's called refactoring because if you tell your boss you are rewriting your code, he or she will say 'why didn't you write it correctly first time?'. If you say you are refactoring your code, they'll say "hmmm... good". THIS ACTUALLY HAPPENS.
As others have said, Rcpp is made of win.
A complete example using R, C++ and Rcpp is provided by this blog post which was inspired by a this post on Darren Wilkinson's blog (and he has more follow-ups). The example is also included with recent releases of Rcpp in a directory RcppGibbs and should get you going.
I have a blog post which discusses exactly this topic which I suggest you take a look at:
http://darrenjw.wordpress.com/2011/07/31/faster-gibbs-sampling-mcmc-from-within-r/
(this post is more relevant than the post of mine that Dirk refers to).
I think the best method currently to integrate C or C++ is the Rcpp package of Dirk Eddelbuettel. You can find a lot of information at his website. There is also a talk at Google that is available through youtube that might be interesting.
Check out this project:
https://github.com/armstrtw/rcppbugs
Also, here is a link to the R/Fin 2012 talk:
https://github.com/downloads/armstrtw/rcppbugs/rcppbugs.pdf
I would suggest to benchmark each step of the MCMC sampler and identify the bottleneck. If you put each full conditional or M-H-step into a function, you can use the R compiler package which might give you 5%-10% speed gain. The next step is to use RCPP.
I think it would be really nice to have a general-purpose RCPP function which generates just one single draw using the M-H algorithm given a likelihood function.
However, with RCPP some things become difficult if you only know the R language: non-standard random distributions (especially truncated ones) and using arrays. You have to think more like a C programmer there.
Multivariate Normal is actually a big issue in R. Dmvnorm is very inefficient and slow. Dmnorm is faster, but it would give me NaNs quicker than dmvnorm in some models.
Neither does take an array of covariance matrices, so it is impossible to vectorize code in many instances. As long as you have a common covariance and means, however, you can vectorize, which is the R-ish strategy to speed up (and which is the oppositve of what you would do in C).
I created a special-purpose "programming language" that deliberately (by design) cannot evaluate the same piece of code twice (ie. it cannot loop). It essentially is made to describe a flowchart-like process where each element in the flowchart is a conditional that performs a different test on the same set of data (without being able to modify it). Branches can split and merge, but never in a circular fashion, ie. the flowchart cannot loop back onto itself. When arriving at the end of a branch, the current state is returned and the program exits.
When written down, a typical program superficially resembles a program in a purely functional language, except that no form of recursion is allowed and functions can never return anything; the only way to exit a function is to call another function, or to invoke a general exit statement that returns the current state. A similar effect could also be achieved by taking a structured programming language and removing all loop statements, or by taking an "unstructured" programming language and forbidding any goto or jmp statement that goes backwards in the code.
Now my question is: is there a concise and accurate way to describe such a language? I don't have any formal CS background and it is difficult for me to understand articles about automata theory and formal language theory, so I'm a bit at a loss. I know my language is not Turing complete, and through great pain, I managed to assure myself that my language probably can be classified as a "regular language" (ie. a language that can be evaluated by a read-only Turing machine), but is there a more specific term?
Bonus points if the term is intuitively understandable to an audience that is well-versed in general programming concepts but doesn't have a formal CS background. Also bonus points if there is a specific kind of machine or automaton that evaluates such a language. Oh yeah, keep in mind that we're not evaluating a stream of data - every element has (read-only) access to the full set of input data. :)
I believe that your language is sufficiently powerful to encode precisely the star-free languages. This is a subset of that regular languages in which no expression contains a Kleene star. In other words, it's the language of the empty string, the null set, and individual characters that is closed under concatenation and disjunction. This is equivalent to the set of languages accepted by DFAs that don't have any directed cycles in them.
I can attempt a proof of this here given your description of your language, though I'm not sure it will work precisely correctly because I don't have full access to your language. The assumptions I'm making are as follows:
No functions ever return. Once a function is called, it will never return control flow to the caller.
All calls are resolved statically (that is, you can look at the source code and construct a graph of each function and the set of functions it calls). In other words, there aren't any function pointers.
The call graph is acyclic; for any functions A and B, then exactly one of the following holds: A transitively calls B, B transitively calls A, or neither A nor B transitively call one another.
More generally, the control flow graph is acyclic. Once an expression evaluates, it never evaluates again. This allows us to generalize the above so that instead of thinking of functions calling other functions, we can think of the program as a series of statements that all call one another as a DAG.
Your input is a string where each letter is scanned once and only once, and in the order in which it's given (which seems reasonable given the fact that you're trying to model flowcharts).
Given these assumptions, here's a proof that your programs accept a language iff that language is star-free.
To prove that if there's a star-free language, there's a program in your language that accepts it, begin by constructing the minimum-state DFA for that language. Star-free languages are loop-free and scan the input exactly once, and so it should be easy to build a program in your language from the DFA. In particular, given a state s with a set of transitions to other states based on the next symbol of input, you can write a function that
looks at the next character of input and then calls the function encoding the state being transitioned to. Since the DFA has no directed cycles, the function calls have no directed cycles, and so each statement will be executed exactly once. We now have that (∀ R. is a star-free language → ∃ a program in your language that accepts it).
To prove the reverse direction of implication, we essentially reverse this construction and create an ε-NFA with no cycles that corresponds to your program. Doing a subset construction on this NFA to reduce it to a DFA will not introduce any cycles, and so you'll have a star-free language. The construction is as follows: for each statement si in your program, create a state qi with a transition to each of the states corresponding to the other statements in your program that are one hop away from that statement. The transitions to those states will be labeled with the symbols of input consumed making each of the decisions, or ε if the transition occurs without consuming any input. This shows that (∀ programs P in your language, &exists; a star-free language R the accepts just the strings accepted by your language).
Taken together, this shows that your programs have identically the power of the star-free languages.
Of course, the assumptions I made on what your programs can do might be too limited. You might have random-access to the input sequence, which I think can be handled with a modification of the above construction. If you can potentially have cycles in execution, then this whole construction breaks. But, even if I'm wrong, I still had a lot of fun thinking about this, and thank you for an enjoyable evening. :-)
Hope this helps!
I know this question is somewhat old, but for posterity, the phrase you are looking for is "decision tree". See http://en.wikipedia.org/wiki/Decision_tree_model for details. I believe this captures exactly what you have done and has a pretty descriptive name to boot!
I'm looking for some advice on how to go about implementing Gradient (steepest) Descent in C. I am finding the minimum of f(x)=||Ax-y||^2, with A(n,n) and y(n) given.
This is difficult in C (I think) because computing the gradient, Δf(x)=[df/dx(1), ..., df/dx(n)] requires calculating derivatives.
I just wanted to throw this at SO to get some direction on going about programming this, e.g.:
1) What dimensionality would be best to start with (1,2,...)
2) Advice on how to go about doing the partial derivatives
3) Whether I should implement in an easier language, like python, first -- then translate over to C
4) Etc.
Let me know your thoughts! Thanks in advance
1) Start in 2D, this way you can plot the path of the descent and actually see your algorithm working.
2) df/dx = (f(x+h)-f(x-h))/(2*h) if f evaluation is cheap, (f(x+h)-f(x))/h if it is expensive. The choice of h should balance truncation error (mostly with big h) and roundoff error (small h). Typical values of h are ~ pow(DBL_EPSILON, 1./3), but the actual exponent depends on the formula for the derivative, and ideally there should be a prefactor that depends on f. You may plot the numerical derivative as a function of h in a logscale, for some given sample points in the parameter space. You will then clearly see the range of h that is optimal for the points you are sampling.
3) Yes whatever you find easier.
4) The hard point is finding the optimal step size. You may want to use an inner loop here to search for the optimal step.
1) I'd start with a simple 1D example, and then go to 2D once I'm sure that works.
2) As you know the objective function before-hand, maybe you can supply an analytical gradient as well. If possible, that is (almost) always better than resorting to numerical derivatives.
3) By all means.
4) Presumably steepest descent is only the first step, next maybe look into something like CG or BFGS.
I am finding the minimum of f(x)=||Ax-y||^2, with A(n,n) and y(n) given.
This problem is known as least squares, and you are doing unconstrained optimization. Writing a finite-difference gradient descent solver in C is not the right approach at all. First of all you can easily calculate the derivative analytically, so there is no reason to do finite difference. Also, the problem is convex, so it even gets easier.
(Let A' denote the transpose of A)
d/dx ||Ax - y||^2 = 2*A'*(Ax - y)
since this is a covex problem we know the global min will occur when the derivative is 0
0 = 2*A'(Ax - y)
A'y = A'Ax
inverse(A'A)*A'y = x
A'A is guaranteed invertible because it is positive definite, so the problem reduces to calculating this inverse which is O(N^3). That said, there are libraries to do least squares in both C and python, so you should probably just use them instead of writing your own code.