different binomial coefficient algorithm's efficiency compare - c

I've compared two algorithms to calculate binomial coefficient C(n, k) as below: #1 is derived from the formulaic definition of the binomial coefficient to calculate, #2 uses dynamic programming.
#include <stdio.h>
#include <sys/time.h>
#define min(x, y) (x<y?x:y)
#define NMAX 150
double binomial_formula(int n, int k) {
double denominator=1, numerator=1, i;
for (i = 0; i< k; i++)
numerator *= (n-i), denominator *= (i+1);
return numerator/denominator;
}
double binomial_dynamic_pro(int n, int k) {
double c[NMAX][NMAX];
int i,j;
for (i = 0; i <= n; i++) {
for (j = 0; j <= min(i, k); j++) {
if (i == j || j == 0)
c[i][j] = 1;
else
c[i][j] = c[i-1][j-1]+c[i-1][j];
}
}
return c[n][k];
}
int main(void) {
struct timeval s, e;
int n = 50, k = 30;
double re = 0;
printf("now formula calc C(%d, %d)..\n", n, k);
gettimeofday(&s, NULL);
re = binomial_formula(n, k);
gettimeofday(&e, NULL);
printf("%.0f, use time: %ld'us\n", re,
1000000*(e.tv_sec-s.tv_sec)+ (e.tv_usec-s.tv_usec));
printf("now dynamic calc C(%d, %d)..\n", n, k);
gettimeofday(&s, NULL);
re = binomial_dynamic_pro(n, k);
gettimeofday(&e, NULL);
printf("%.0f, use time: %ld'us\n", re,
1000000*(e.tv_sec-s.tv_sec)+ (e.tv_usec-s.tv_usec));
return 0;
}
and I use gcc to compile, and it runs out like this:
now formula calc C(50, 30)..
47129212243960, use time: 2'us
now dynamic calc C(50, 30)..
47129212243960, use time: 102'us
These results are unexpected for me. I thought that dynamic programming should be faster, for it's O(nk), but the formula's method should be O(k^2) and it uses multiplication, which should be also be slower.
So why is the dynamic programming version so much slower?

binomial_formula as written is definitely not O(k^2). It only has a single loop which is of size k making it O(k). You should also keep in mind that on modern architectures that the cost of memory accesses dwarf the cost of any single instruction by an order of magnitude, and your dynamic programming solution reads and writes many more addresses in memory. The first version can be computed entirely in a few registers.
Note that you can actually improve on your linear version by recognizing that C(n,k) == C(n, n-k):
double binomial_formula(int n, int k) {
double delominator=1, numerator=1, i;
if (k > n/2)
k = n - k;
for (i = 0; i< k; i++)
numerator *= (n-i), delominator *= (i+1);
return numerator / delominator;
}
You should keep in mind that dynamic programming is just a technique and not a silver bullet. It doesn't magically make all algorithms faster.

First algorithm
Takes linear time
Uses a constant amount of space
Second algorithm
Takes quadratic time
Uses quadratic amount of space
In terms of time/space, first algorithm is better but the second algorithm has the advantage of computing answer for smaller values as well; it can be used as a pre-processing step.
Imagine that you are given a number of queries of the form n k and you are asked to write n choose k for each of them. Further, imagine that the number of queries is big (say around n*n). Using the first algorithm takes O(nq) = O(n*n*n) while using the second algorithm takes O(n*n).
So it all depends on what you are trying to do.

Related

how to find prime numbers function - loop without getting timeout error? [duplicate]

I have to print numbers between two limits n and m, t times.
I created t variable, and two pointers n, m that points to reserved blocks of memory for t integer values.
I use pointers instead of array to do faster operations.
Outer for loop iterates for every test cases and increasing m and n pointers.
Inner for loop prints primes from m[i] to n[i].
Code
#include <stdio.h>
#include <stdlib.h>
int is_prime(int);
int main(void) {
int t;
int *n = malloc(sizeof(int) * t);
int *m = malloc(sizeof(int) * t);
scanf("%d", &t);
for (int i = 0; i < t; i++, m++, n++) {
scanf("%d %d", &m[i], &n[i]);
for (int j = m[i]; j <= n[i]; j++) {
if (is_prime(j)) {
printf("%d\n", j);
}
}
if (i < t - 1) printf("\n");
}
return 0;
}
int is_prime(int num)
{
if (num <= 1) return 0;
if (num % 2 == 0 && num > 2) return 0;
for(int i = 3; i < num / 2; i+= 2){
if (num % i == 0)
return 0;
}
return 1;
}
Problem: http://www.spoj.com/problems/PRIME1/
Code is correctly compiling on http://ideone.com but I'm giving "time limit exceeded" error when I'm trying submit this code on SPOJ. How can I reduce execution time of this prime number generator?
As #Carcigenicate suggests, you're exceeding the time limit because your prime generator is too slow; and it's too slow since you're using an inefficient algorithm.
Indeed, you should not simply test each consecutive number for primality (which, by the way, you're also doing ineffectively), but rather rule out multiple values at once using known primes (and perhaps additional primes which you compute). For example, you don't need to check multiples of 5 and 10 (other than the actual value 5) for primality, since you know that 5 divides them. So just "mark" the multiples of various primes as irrelevant.
... and of course, that's just for getting you started, there are all sort of tricks you could use for optimization - algorithmic and implementation-related.
I know that you are looking for algorithm improvements, but the following technical optimizations might help:
If you are using Visual Studio, you can use alloca instead of malloc, so that n and m go in the stack instead of the heap.
You can also try to rewrite your algorithm using arrays instead of pointers to put n and m in the stack.
If you want to keep using pointers, use the __restrict keyword after the asterisks, which alerts the compiler that you don't make references of the two pointers.
You can even do it without using pointers or arrays
#include <stdio.h>
#include<math.h>
int is_prime(long n){
if (n == 1 || n % 2 == 0)
return 0;
if (n == 2)
return 1;
for (long i = 3; i <= sqrt(n); i += 2) {
if(n % i == 0)
return 0;
}
return 1;
}
int main() {
int t;
scanf("%d",&t);
while(t--) {
long n, m;
scanf("%ld %ld",&n,&m);
for (long i = n; i <= m; i++) {
if (is_prime(i) == 1)
printf("%ld\n",i);
}
}
return 0;
}
There are several ways to improve the primality check for an integer n. Here are a few that you might find useful.
Reduce the number of checks: A well known theorem is giving the fact that if you want to look for factors of n, let say n = a * b, then you can look for a divisor between 1 and sqrt(n). (Proof is quite easy, the main argument being that we have three cases, either a = b = sqrt(n), or we have a < sqrt(n) < b or b < sqrt(n) < a. And, whatever case we fall in, there will be a factor of n between 1 and sqrt(n)).
Use a Sieve of Eratosthenes: This way allows to discard unnecessary candidates which are previously disqualified (see Sieve of Eratosthenes (Wikipedia))
Use probabilistic algorithms: The most efficient way to check for primality nowadays is to use a probabilistic test. It is a bit more complex to implements but it is way more efficient. You can find a few of these techniques here (Wikipedia).

Reduce execution time of prime number generator

I have to print numbers between two limits n and m, t times.
I created t variable, and two pointers n, m that points to reserved blocks of memory for t integer values.
I use pointers instead of array to do faster operations.
Outer for loop iterates for every test cases and increasing m and n pointers.
Inner for loop prints primes from m[i] to n[i].
Code
#include <stdio.h>
#include <stdlib.h>
int is_prime(int);
int main(void) {
int t;
int *n = malloc(sizeof(int) * t);
int *m = malloc(sizeof(int) * t);
scanf("%d", &t);
for (int i = 0; i < t; i++, m++, n++) {
scanf("%d %d", &m[i], &n[i]);
for (int j = m[i]; j <= n[i]; j++) {
if (is_prime(j)) {
printf("%d\n", j);
}
}
if (i < t - 1) printf("\n");
}
return 0;
}
int is_prime(int num)
{
if (num <= 1) return 0;
if (num % 2 == 0 && num > 2) return 0;
for(int i = 3; i < num / 2; i+= 2){
if (num % i == 0)
return 0;
}
return 1;
}
Problem: http://www.spoj.com/problems/PRIME1/
Code is correctly compiling on http://ideone.com but I'm giving "time limit exceeded" error when I'm trying submit this code on SPOJ. How can I reduce execution time of this prime number generator?
As #Carcigenicate suggests, you're exceeding the time limit because your prime generator is too slow; and it's too slow since you're using an inefficient algorithm.
Indeed, you should not simply test each consecutive number for primality (which, by the way, you're also doing ineffectively), but rather rule out multiple values at once using known primes (and perhaps additional primes which you compute). For example, you don't need to check multiples of 5 and 10 (other than the actual value 5) for primality, since you know that 5 divides them. So just "mark" the multiples of various primes as irrelevant.
... and of course, that's just for getting you started, there are all sort of tricks you could use for optimization - algorithmic and implementation-related.
I know that you are looking for algorithm improvements, but the following technical optimizations might help:
If you are using Visual Studio, you can use alloca instead of malloc, so that n and m go in the stack instead of the heap.
You can also try to rewrite your algorithm using arrays instead of pointers to put n and m in the stack.
If you want to keep using pointers, use the __restrict keyword after the asterisks, which alerts the compiler that you don't make references of the two pointers.
You can even do it without using pointers or arrays
#include <stdio.h>
#include<math.h>
int is_prime(long n){
if (n == 1 || n % 2 == 0)
return 0;
if (n == 2)
return 1;
for (long i = 3; i <= sqrt(n); i += 2) {
if(n % i == 0)
return 0;
}
return 1;
}
int main() {
int t;
scanf("%d",&t);
while(t--) {
long n, m;
scanf("%ld %ld",&n,&m);
for (long i = n; i <= m; i++) {
if (is_prime(i) == 1)
printf("%ld\n",i);
}
}
return 0;
}
There are several ways to improve the primality check for an integer n. Here are a few that you might find useful.
Reduce the number of checks: A well known theorem is giving the fact that if you want to look for factors of n, let say n = a * b, then you can look for a divisor between 1 and sqrt(n). (Proof is quite easy, the main argument being that we have three cases, either a = b = sqrt(n), or we have a < sqrt(n) < b or b < sqrt(n) < a. And, whatever case we fall in, there will be a factor of n between 1 and sqrt(n)).
Use a Sieve of Eratosthenes: This way allows to discard unnecessary candidates which are previously disqualified (see Sieve of Eratosthenes (Wikipedia))
Use probabilistic algorithms: The most efficient way to check for primality nowadays is to use a probabilistic test. It is a bit more complex to implements but it is way more efficient. You can find a few of these techniques here (Wikipedia).

sum of all arr[j] where i divides j

what is best way to find just sum of all elements of array whose index divisible by i with least complexity.
I have written below code. But thats brute force. Can i get better than that
#include<stdio.h>
int main() {
int n, q;
int mod = 1000000000 + 7;
scanf("%d", &n);
int arr[n+1];
int i;
for (i = 1; i <= n ; ++i) {
scanf("%d", &arr[i]);
}
int p;
scanf("%d", &p);
int sum = 0;
int j;
for(j = p; j <= n; j = j+p) {
sum = (sum + arr[j]) % mod;
}
printf("%d\n",sum);
return 0;
}
You remark that your example implementation is "brute force" and ask whether you can "do better". Brute force usually implies an approach that is simple to implement but performs substantially more work or uses substantially more memory than is theoretically necessary. It suggests devoting overwhelming resources in place of efficient operation. Often, "substantially more" boils down to such approaches having higher asymptotic complexity than the best possible approaches.
Your example implementation is not like that. Adding n / p arbitrary numbers requires n / p operations, so O(n) is the least asymptotic complexity an algorithm for the task can have. That is the asymptotic complexity of your implementation, so it cannot be improved in that sense.
Furthermore, your implementation appears to perform about as few overall operations as you could hope for. Consider this naive, alternative, worse implementation of the summation loop:
for(j = 1; j <= n; j++) {
if (j % p == 0) {
sum = (sum + arr[j]) % mod;
}
}
That could be viewed as a somewhat more direct translation of the requirement into C code. Although it's still only O(n), it might reasonably be characterized as a brute force implementation because of the (p-1)-fold excess of increments to j and the n computations of j % p, both of which your implementation avoids.
Bottom line: no, there is no substantially more efficient implementation than the one you present.

Calculate sum of 1+(1/2!)+…+(1/n!) n number in C language

Like the title say, how I calculate the sum of n number of the form: 1+(1/2!)+⋯(1/n!)? I already got the code for the harmonic series:
#include <stdio.h>
int main( void )
{
int v=0,i,ch;
double x=0.;
printf("Introduce un número paracalcular la suma: ");
while(scanf("%d",&v)==0 || v<=0)
{
printf("Favor de introducir numeros reales positivos: ");
while((ch=getchar())!='\n')
if(ch==EOF)
return 1;
}
for (i=v; i>=1; i--)
x+=1./i;
printf("EL valor de la serie es %f\n", x);
getch();
return 0;
}
The question here is.. I already got the sum as the fraction, but how make the variable "i" factorial?
Note: I´m programming in language C, with DEV -C++ 4.9.9.2
You got a slightly more accurate answer for the harmonic summing 1./i + 1./(i-1) ... 1./1. Suggest you stay with that order.
[edit] Rewrite: Thanks to #pablo197 for pointing out the error of my ways.
To calculate harmonic and 1+(1/2!)+…+(1/n!), continue summing the least significant terms together first as that helps to minimize precision loss. Starting with the least significant term 1/n as sum, sum of that and the n-1 term is : sum = (1 + sum)/(n-1) and so on. (See below)
double x = 0.0;
double one_over_factorial_series = 0.0;
for (i = v; i >= 1; i--) {
x += 1.0/i;
one_over_factorial_series = (one_over_factorial_series + 1)/i;
}
printf("harmonic:%le\n", x);
// 2.828968e+00
printf("one_over_factorial:%.10le\n", one_over_factorial_series);
// 1.7182815256e+00
Add 1.0 or 1/0! to one_over_factorial_series, the result about e = 2.7182818284...
[Edit] Detail showing how direct n! calculation is avoided.
1 + (1/2!) + … + (1/n!) =
1/n! + 1/((n-1)!) + 1/((n-2)!) + 1/((n-3)!) + ... + 1 =
(1/n + 1)/((n-1)!) + 1/((n-2)!) + 1/((n-3)!) + ... + 1 =
((1/n + 1)/(n-1) + 1)/((n-2)!) + 1/((n-3)!) + ... + 1 =
...
((((1/n + 1)/(n-1) + 1)/(n-2) + 1)/(n-3) + 1)/(n-4) + ... =
If you're just looking for computing the first n factorials, I would suggest just computing them recursively, e.g.
factorial[0] = 1;
for (i = 1; i < n; i++) factorial[i] = factorial[i-1] * i;
However, unless you store them as floating point numbers, the large factorials are going to overflow really quickly.
Calculating factorial in this case is bad thing to do because it can cause overflow for small values of N . Use following pseudo code to get it in O(N) without overflow.
double sum = 0.0;
double acc = 1;
double error = 0.0000001;
for(i=1;i<=n;i++) {
acc = acc/i;
if(acc<error)
break;
sum = sum + acc;
}
print(sum);
More acurrate way of doing it though i feel it is unnecessary in case of factorials : -
double sum = 0.0;
double acc = 1;
for(i=n;i>=1;i--) {
sum = (sum + 1)/i;
}
print(sum);
Note:- Because the above method is built in reverse it more accurate but unfortunately more time consuming because it is O(N) even for higher values whereas the gain in accuracy is negligible as factorial function grows very fast hence error keeps on decreasing quickly.
The number n! is equal to the product of n and the preceding factorial, that is, (n - 1)!.
If you calculate n! in an iteration, you are doing n products.
In the next step, say n+1, you repeat again these n products followed by the multiplication by n+1.
This means that you are repeating the same operations again and again.
It is a better strategy to hold the previous factorial that was calculated in the step n, and then, in the step n+1, just to multiply the n! by n+1. This reduces the number of products to 1 in each iteration.
Thus, you can calculate the series in the following way:
int max_n = 20; /* This value can come from another point of the program */
int n; /* Initial value of the index */
double factorial_n = 1; /* It has to be initialized to 1, since the factorial of 0 is 1 */
double sum = 0.0; /* It has to be initialized to 0, in order to calculate the series */
for (n = 0; n <= max_n; )
{
sum += 1.0/factorial_n;
n++;
factorial_n *= n;
}
printf("Series result: %.20f\n", sum);
There are some numerical issues with this approach, but this go beyond the scope of your question.
About overflow: It is necessary to be carefull about the overflow of factorials after several iterations. However, I will not write code to handle overflow.
EDIT
I think that you have not to follow the suggestions of those people that advice to use a factorial function. This approach is very unefficient, since a lot of products are done in every iteration.
IN comparisson with that approach, the mine is better.
However, if you have plans to calculate these series very often, then my approach is not efficient anymore. Then, the right technique is that pointed out in the Bli0042's answer, that is: to hold the factorials in an array, and then just use them every time you need, without need to calculate them again and again in the future.
The resulting program would be this:
#include <stdio.h>
#define MAX_N 100
double factorial[MAX_N+1];
void build_factorials(double *factorial, int max)
{
factorial[0] = 1.0;
for (int j = 0; j <= max; )
{
j++;
factorial[j] = factorial[j-1] * j;
}
}
double exp_series(int n)
{
int j;
double sum;
if (n > MAX_N) /* Error */
return 0.0;
sum = 0.0;
for (j = n; j >= 0; j--)
sum += 1.0/factorial[j];
return sum;
}
int main(void)
{
int n;
double sum;
build_factorials(factorial, MAX_N);
printf("Series (up to n == 11): %.20f\n", exp_series(11));
printf("Series (up to n == 17): %.20f\n", exp_series(17));
printf("Series (up to n == 9): %.20f\n", exp_series(9));
getchar();
}
The iteration is done in reverse order inside the function exp_series() in order to improve the numerical issues (that is, to amortiguate the loss of precision when summing small terms).
The last code has side effects, because an external array is invoked inside the function exp_series().
However, I think that handling this would become my explanation more obscure.
Just, take it in account.

LU Decomposition from Numerical Recipes not working; what am I doing wrong?

I've literally copied and pasted from the supplied source code for Numerical Recipes for C for in-place LU Matrix Decomposition, problem is its not working.
I'm sure I'm doing something stupid but would appreciate anyone being able to point me in the right direction on this; I've been working on its all day and can't see what I'm doing wrong.
POST-ANSWER UPDATE: The project is finished and working. Thanks to everyone for their guidance.
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#define MAT1 3
#define TINY 1e-20
int h_NR_LU_decomp(float *a, int *indx){
//Taken from Numerical Recipies for C
int i,imax,j,k;
float big,dum,sum,temp;
int n=MAT1;
float vv[MAT1];
int d=1.0;
//Loop over rows to get implicit scaling info
for (i=0;i<n;i++) {
big=0.0;
for (j=0;j<n;j++)
if ((temp=fabs(a[i*MAT1+j])) > big)
big=temp;
if (big == 0.0) return -1; //Singular Matrix
vv[i]=1.0/big;
}
//Outer kij loop
for (j=0;j<n;j++) {
for (i=0;i<j;i++) {
sum=a[i*MAT1+j];
for (k=0;k<i;k++)
sum -= a[i*MAT1+k]*a[k*MAT1+j];
a[i*MAT1+j]=sum;
}
big=0.0;
//search for largest pivot
for (i=j;i<n;i++) {
sum=a[i*MAT1+j];
for (k=0;k<j;k++) sum -= a[i*MAT1+k]*a[k*MAT1+j];
a[i*MAT1+j]=sum;
if ((dum=vv[i]*fabs(sum)) >= big) {
big=dum;
imax=i;
}
}
//Do we need to swap any rows?
if (j != imax) {
for (k=0;k<n;k++) {
dum=a[imax*MAT1+k];
a[imax*MAT1+k]=a[j*MAT1+k];
a[j*MAT1+k]=dum;
}
d = -d;
vv[imax]=vv[j];
}
indx[j]=imax;
if (a[j*MAT1+j] == 0.0) a[j*MAT1+j]=TINY;
for (k=j+1;k<n;k++) {
dum=1.0/(a[j*MAT1+j]);
for (i=j+1;i<n;i++) a[i*MAT1+j] *= dum;
}
}
return 0;
}
void main(){
//3x3 Matrix
float exampleA[]={1,3,-2,3,5,6,2,4,3};
//pivot array (not used currently)
int* h_pivot = (int *)malloc(sizeof(int)*MAT1);
int retval = h_NR_LU_decomp(&exampleA[0],h_pivot);
for (unsigned int i=0; i<3; i++){
printf("\n%d:",h_pivot[i]);
for (unsigned int j=0;j<3; j++){
printf("%.1lf,",exampleA[i*3+j]);
}
}
}
WolframAlpha says the answer should be
1,3,-2
2,-2,7
3,2,-2
I'm getting:
2,4,3
0.2,2,-2.8
0.8,1,6.5
And so far I have found at least 3 different versions of the 'same' algorithm, so I'm completely confused.
PS yes I know there are at least a dozen different libraries to do this, but I'm more interested in understanding what I'm doing wrong than the right answer.
PPS since in LU Decomposition the lower resultant matrix is unity, and using Crouts algorithm as (i think) implemented, array index access is still safe, both L and U can be superimposed on each other in-place; hence the single resultant matrix for this.
I think there's something inherently wrong with your indices. They sometimes have unusual start and end values, and the outer loop over j instead of i makes me suspicious.
Before you ask anyone to examine your code, here are a few suggestions:
double-check your indices
get rid of those obfuscation attempts using sum
use a macro a(i,j) instead of a[i*MAT1+j]
write sub-functions instead of comments
remove unnecessary parts, isolating the erroneous code
Here's a version that follows these suggestions:
#define MAT1 3
#define a(i,j) a[(i)*MAT1+(j)]
int h_NR_LU_decomp(float *a, int *indx)
{
int i, j, k;
int n = MAT1;
for (i = 0; i < n; i++) {
// compute R
for (j = i; j < n; j++)
for (k = 0; k < i-2; k++)
a(i,j) -= a(i,k) * a(k,j);
// compute L
for (j = i+1; j < n; j++)
for (k = 0; k < i-2; k++)
a(j,i) -= a(j,k) * a(k,i);
}
return 0;
}
Its main advantages are:
it's readable
it works
It lacks pivoting, though. Add sub-functions as needed.
My advice: don't copy someone else's code without understanding it.
Most programmers are bad programmers.
For the love of all that is holy, don't use Numerical Recipies code for anything except as a toy implementation for teaching purposes of the algorithms described in the text -- and, really, the text isn't that great. And, as you're learning, neither is the code.
Certainly don't put any Numerical Recipies routine in your own code -- the license is insanely restrictive, particularly given the code quality. You won't be able to distribute your own code if you have NR stuff in there.
See if your system already has a LAPACK library installed. It's the standard interface to linear algebra routines in computational science and engineering, and while it's not perfect, you'll be able to find lapack libraries for any machine you ever move your code to, and you can just compile, link, and run. If it's not already installed on your system, your package manager (rpm, apt-get, fink, port, whatever) probably knows about lapack and can install it for you. If not, as long as you have a Fortran compiler on your system, you can download and compile it from here, and the standard C bindings can be found just below on the same page.
The reason it's so handy to have a standard API to linear algebra routines is that they are so common, but their performance is so system-dependant. So for instance, Goto BLAS
is an insanely fast implementation for x86 systems of the low-level operations which are needed for linear algebra; once you have LAPACK working, you can install that library to make everything as fast as possible.
Once you have any sort of LAPACK installed, the routine for doing an LU factorization of a general matrix is SGETRF for floats, or DGETRF for doubles. There are other, faster routines if you know something about the structure of the matrix - that it's symmetric positive definite, say (SBPTRF), or that it's tridiagonal (STDTRF). It's a big library, but once you learn your way around it you'll have a very powerful piece of gear in your numerical toolbox.
The thing that looks most suspicious to me is the part marked "search for largest pivot". This does not only search but it also changes the matrix A. I find it hard to believe that is correct.
The different version of the LU algorithm differ in pivoting, so make sure you understand that. You cannot compare the results of different algorithms. A better check is to see whether L times U equals your original matrix, or a permutation thereof if your algorithm does pivoting. That being said, your result is wrong because the determinant is wrong (pivoting does not change the determinant, except for the sign).
Apart from that #Philip has good advice. If you want to understand the code, start by understanding LU decomposition without pivoting.
To badly paraphrase Albert Einstein:
... a man with a watch always knows the
exact time, but a man with two is
never sure ....
Your code is definitely not producing the correct result, but even if it were, the result with pivoting will not directly correspond to the result without pivoting. In the context of a pivoting solution, what Alpha has really given you is probably the equivalent of this:
1 0 0 1 0 0 1 3 -2
P= 0 1 0 L= 2 1 0 U = 0 -2 7
0 0 1 3 2 1 0 0 -2
which will then satisfy the condition A = P.L.U (where . denotes the matrix product). If I compute the (notionally) same decomposition operation another way (using the LAPACK routine dgetrf via numpy in this case):
In [27]: A
Out[27]:
array([[ 1, 3, -2],
[ 3, 5, 6],
[ 2, 4, 3]])
In [28]: import scipy.linalg as la
In [29]: LU,ipivot = la.lu_factor(A)
In [30]: print LU
[[ 3. 5. 6. ]
[ 0.33333333 1.33333333 -4. ]
[ 0.66666667 0.5 1. ]]
In [31]: print ipivot
[1 1 2]
After a little bit of black magic with ipivot we get
0 1 0 1 0 0 3 5 6
P = 0 0 1 L = 0.33333 1 0 U = 0 1.3333 -4
1 0 0 0.66667 0.5 1 0 0 1
which also satisfies A = P.L.U . Both of these factorizations are correct, but they are different and they won't correspond to a correctly functioning version of the NR code.
So before you can go deciding whether you have the "right" answer, you really should spend a bit of time understanding the actual algorithm that the code you copied implements.
This thread has been viewed 6k times in the past 10 years. I had used NR Fortran and C for many years, and do not share the low opinions expressed here.
I explored the issue you encountered, and I believe the problem in your code is here:
for (k=j+1;k<n;k++) {
dum=1.0/(a[j*MAT1+j]);
for (i=j+1;i<n;i++) a[i*MAT1+j] *= dum;
}
while in the original if (j != n-1) { ... } is used. I think the two are not equivalent.
NR's lubksb() does have a small issue in the way they set up finding the first non-zero element, but this can be skipped at very low cost, even for a large matrix. With that, both ludcmp() and lubksb(), entered as published, work just fine, and as far as I can tell perform well.
Here's a complete test code, mostly preserving the notation of NR, wth minor simplifications (tested under Ubuntu Linux/gcc):
/* A sample program to demonstrate matrix inversion using the
* Crout's algorithm from Teukolsky and Press (Numerical Recipes):
* LU decomposition + back-substitution, with partial pivoting
* 2022.06 edward.sternin at brocku.ca
*/
#define N 7
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define a(i,j) a[(i)*n+(j)]
/* implied 1D layout is a(0,0), a(0,1), ... a(0,n-1), a(1,0), a(1,1), ... */
void matrixPrint (double *M, int nrow, int ncol) {
int i,j;
for (i=0;i<nrow;i++) {
for (j=0;j<ncol;j++) { fprintf(stderr," %+.3f\t",M[i*ncol+j]); }
fprintf(stderr,"\n");
}
}
void die(char msg[]) {
fprintf(stderr,"ERROR in %s, aborting\n",msg);
exit(1);
}
void ludcmp(double *a, int n, int *indx) {
int i, imax, j, k;
double big, dum, sum, temp;
double *vv;
/* i=row index, i=0..(n-1); j=col index, j=0..(n-1) */
vv=(double *)malloc((size_t)(n * sizeof(double)));
if (!vv) die("ludcmp: allocation failure");
for (i = 0; i < n; i++) { /* loop over rows */
big = 0.0;
for (j = 0; j < n; j++) {
if ((temp=fabs(a(i,j))) > big) big=temp;
}
if (big == 0.0) die("ludcmp: a singular matrix provided");
vv[i] = 1.0 / big; /* vv stores the scaling factor for each row */
}
for (j = 0; j < n; j++) { /* Crout's method: loop over columns */
for (i = 0; i < j; i++) { /* except for i=j */
sum = a(i,j);
for (k = 0; k < i; k++) { sum -= a(i,k) * a(k,j); }
a(i,j) = sum; /* Eq. 2.3.12, in situ */
}
big = 0.0; /* searching for the largest pivot element */
for (i = j; i < n; i++) {
sum = a(i,j);
for (k = 0; k < j; k++) { sum -= a(i,k) * a(k,j); }
a(i,j) = sum;
if ((dum = vv[i] * fabs(sum)) >= big) {
big = dum;
imax = i;
}
}
if (j != imax) { /* if needed, interchange rows */
for (k = 0; k < n; k++){
dum = a(imax,k);
a(imax,k) = a(j,k);
a(j,k) = dum;
}
vv[imax] = vv[j]; /* keep the scale factor with the new row location */
}
indx[j] = imax;
if (j != n-1) { /* divide by the pivot element */
dum = 1.0 / a(j,j);
for (i = j + 1; i < n; i++) a(i,j) *= dum;
}
}
free(vv);
}
void lubksb(double *a, int n, int *indx, double *b) {
int i, ip, j;
double sum;
for (i = 0; i < n; i++) {
/* Forward substitution, Eq.2.3.6, unscrambling permutations from indx[] */
ip = indx[i];
sum = b[ip];
b[ip] = b[i];
for (j = 0; j < i; j++) sum -= a(i,j) * b[j];
b[i] = sum;
}
for (i = n-1; i >= 0; i--) { /* backsubstitution, Eq. 2.3.7 */
sum = b[i];
for (j = i + 1; j < n; j++) sum -= a(i,j) * b[j];
b[i] = sum / a(i,i);
}
}
int main() {
double *a,*y,*col,*aa,*res,sum;
int i,j,k,*indx;
a=(double *)malloc((size_t)(N*N * sizeof(double)));
y=(double *)malloc((size_t)(N*N * sizeof(double)));
col=(double *)malloc((size_t)(N * sizeof(double)));
indx=(int *)malloc((size_t)(N * sizeof(int)));
aa=(double *)malloc((size_t)(N*N * sizeof(double)));
res=(double *)malloc((size_t)(N*N * sizeof(double)));
if (!a || !y || !col || !indx || !aa || !res) die("main: memory allocation failure");
srand48((long int) N);
for (i=0;i<N;i++) {
for (j=0;j<N;j++) { aa[i*N+j] = a[i*N+j] = drand48(); }
}
fprintf(stderr,"\nRandomly generated matrix A = \n");
matrixPrint(a,N,N);
ludcmp(a,N,indx);
for(j=0;j<N;j++) {
for(i=0;i<N;i++) { col[i]=0.0; }
col[j]=1.0;
lubksb(a,N,indx,col);
for(i=0;i<N;i++) { y[i*N+j]=col[i]; }
}
fprintf(stderr,"\nResult of LU/BackSub is inv(A) :\n");
matrixPrint(y,N,N);
for (i=0; i<N; i++) {
for (j=0;j<N;j++) {
sum = 0;
for (k=0; k<N; k++) { sum += y[i*N+k] * aa[k*N+j]; }
res[i*N+j] = sum;
}
}
fprintf(stderr,"\nResult of inv(A).A = (should be 1):\n");
matrixPrint(res,N,N);
return(0);
}

Resources