cblas_dgemm - works ONLY if (beta) is power-of-two - c

I am totally stumped. I have a fairly large recursive program written in c that calls cblas_dgemm(). The result is verified independently by a program that works correctly.
C = alpha*A*B + beta*C
On repeated tests using random matrices and all possible combination of parameters the program gives correct answer ONLY if abs(beta) = 2^n (1,2,4,8..). Any value works for alpha. Any other positive/negative, odd/even value for beta gives correct answer b/w 10-30% of the time.
I am using Ubuntu 10.04, GCC 4.4.x, I have tried system installed blas/cblas/atlas as well as manually compiled atlas.
Any hints or suggestions would be greatly appreciated. I am amazed at the wonderfully generous (and smart) folks lurking at this site.
Thanking you all in advance,
Russ

Two completely unrelated errors conspired to produce an illusive picture. It made me look for problems in the wrong place.
(1) There was a simple error in the logic of the function calling dgemm. Would have been easily fixed if I was not chasing the wrong problem.
(2) My double-compare function: double version of AlmostEqual2sComplement() (http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm) used incorrect sized integer - resulting in an incorrect TRUE under certain rare circumstances. This was the first time the error bit me!
Thanks again for the useful suggestion of using the scientific method when trying to debug a program.
Russ

Yes, a full example would be handy. Here is an old example I had hanging around using GSL's sgemm variant; should be easy to fix to double. Please try and see if this gives the result shown in the GSL manual:
/* from the gsl info documentation in node 'gsl cblas examples' */
/* compile via 'gcc -o $file $file.c -lgslcblas' */
/* edd 15 Nov 2003 */
#include <stdio.h>
#include <gsl/gsl_cblas.h>
int
main (void)
{
int lda = 3;
float A[] = { 0.11, 0.12, 0.13,
0.21, 0.22, 0.23 };
int ldb = 2;
float B[] = { 1011, 1012,
1021, 1022,
1031, 1032 };
int ldc = 2;
float C[] = { 0.00, 0.00,
0.00, 0.00 };
/* Compute C = A B */
cblas_sgemm (CblasRowMajor,
CblasNoTrans, CblasNoTrans, 2, 2, 3,
1.0, A, lda, B, ldb, 0.0, C, ldc);
printf ("[ %g, %g\n", C[0], C[1]);
printf (" %g, %g ]\n", C[2], C[3]);
return 0;
}

Related

How do you use GSL's Cholesky Decomposition function with C

I've been using GSL to support some matrix manipulation using C. I'm having a challenge with its Cholesky Decomposition function though and the documentation in the GSL reference manual is sparse to say the least. How do I get the Lower Triangular matrix output of the function?
Below is my code so far ...
# include <gsl/gsl_matrix.h>
# include <gsl/gsl_linalg.h>
#define rows 6
#define cols 6
double cov[rows*cols] = {107.3461, 12.0710, -48.3746, 174.7796, 21.0202, -80.6075,
12.0710, 8.0304, -5.9610, 20.2434, 2.2427, -9.312,
-48.3746, -5.9610, 25.2222, -78.6277, -9.4400, 36.1789,
174.7796, 20.2434, -78.6277, 291.3491, 35.0176, -134.3626,
21.0202, 2.2427, -9.4400, 35.0176, 4.2144, -16.1499,
-80.6075, -9.3129, 36.1789, -134.3626, -16.1499, 61.9666};
gsl_matrix_view m = gsl_matrix_view_array(cov, rows, cols);
int gsl_linalg_cholesky_decomp1(gsl_matrix *m)
... don't know what to do after this step
I know the formulas for calculating this manually, but I'd prefer to take advantage of this library instead.
Any help in this regard would be much appreciated.
Got things to work right with David's suggestion and a bit more digging ...
#include <stdio.h>
#include <gsl/gsl_linalg.h>
int main ()
{
double cov[9] = {2, -1, 0, -1, 2, -1, 0, -1, 2};
gsl_matrix_view m = gsl_matrix_view_array(cov, 3, 3);
gsl_matrix *x = gsl_matrix_alloc(3,3);
gsl_linalg_cholesky_decomp1(&m.matrix);
printf ("x = \n");
gsl_matrix_fprintf (stdout, x, "%g");
}

In what situation the output could go wrong like this?

I am trying to do the problem 200B on codeforces. I tested my code, and the output was all right. But when I uploaded it to the online judge system, I failed on the very first test case. It said my output was -0.000000000000, instead of 66.666666666667.
But I have compiled and run on Visual Studio C++ 2010, MacOS clang 13.0.0, and Linux GCC 6.3.0, the outputs were all the same as mine, 66.666666666667. I am very curious and want to figure out in what situation the output could be -0.000000000000.
On my computer,
Input:
3
50 50 100
Output:
66.666666666667
On the online judge system,
Input:
3
50 50 100
Participant's output
-0.000000000000
Jury's answer
66.666666666667
Checker comment
wrong answer 1st numbers differ - expected: '66.66667', found: '-0.00000', error = '1.00000'
#include <stdio.h>
int main(void)
{
int n;
double sumOrange = 0;
double sumDrink = 0;
scanf ("%d", &n);
while (n-- > 0) {
int m;
scanf("%d", &m);
sumOrange += m / 100.0;
sumDrink++;
}
printf("%.12lf\n", (sumOrange / sumDrink) * 100.0);
return 0;
}
I just don't understand why my output could be -0.000000000000. Please help, thanks.
Update: Tested on different versions of GCC (4.9, 5.1, 6.3), the wrong output does not appear. Guess the cause might lie in the specific implementation of printf.
The problem is because printf function in GNU gcc C11 does not support %.12lf format. It should be changed to %.12f For more information, you can read the article below:
Correct format specifier for double in printf

Unexpected behavior from beginner C program

I am working to learn about computing at a more granular level and have started studying both Linux and C at the same time. Smooth sailing until just now.
I am running Linux Mint 17 on Kernel 3.16.0-38-generic and using Code::Blocks 13.12 as my IDE.
The following pastes are my practice code using data types, variables, and printf(), and the associated output I see in a vt -- the oddity I see is that on my experimentation with decimal places using the float data type, it seems to be skipping values after the 5th and eventually 4th decimal place.
Am I abusing the process of calling a variable, am I missing a bug in my code, or is this a bug in CodeBlocks? Also -- I'm not sure why my code snippet is completely mashed together in the preview, so my apologies for the poor readability
code to be compiled and executed:
/* Prints a message on the screen */
#include <stdio.h>
echar a1 = 'a';
int i1 = 1;
float f1 = 0.123456;
int main()
{
printf("Testing %c la%sof characters using variables \n", a1, "rge string " );
printf("This line has the values %d and %f.\n", i1, f1);
printf("%f can also be displayed as %.5f or %.4f or %.3f or %.2f or %.1f or %.0f \n",
f1, f1, f1, f1, f1, f1, f1);
printf("Which is an integer . . .");
return 0;
}
output of compiled and executed code
Testing a large string of characters using variables
This line has the values 1 and 0.123456.
0.123456 can also be displayed as 0.12346 or 0.1235 or 0.123 or 0.12 or 0.1 or 0
Which is an integer . . .
Process returned 0 (0x0) execution time : 0.002 s
Press ENTER to continue.
Thank you for any help you can provide. I am studying from C Programming Absolute Beginner - by Greg Perry
As was mentioned in the comments, the last digit is being rounded.
If you had this:
float f1 = 0.777777;
The output would be this:
This line has the values 1 and 0.777777.
0.777777 can also be displayed as 0.77778 or 0.7778 or 0.778 or 0.78 or 0.8 or 1
Similarly, if you had this:
float f1 = 0.999888;
You'd get this:
This line has the values 1 and 0.999888.
0.999888 can also be displayed as 0.99989 or 0.9999 or 1.000 or 1.00 or 1.0 or 1

Using LAPACKE_zgetrs with LAPACK_ROW_MAJOR causes illegal memory access

I am trying to solve a linear system using the following code:
#include <stdio.h>
#include <lapacke.h>
int main () {
lapack_complex_double mat[4];
lapack_complex_double vec[2];
lapack_int p[2];
mat[0] = lapack_make_complex_double(1,0);
mat[1] = lapack_make_complex_double(1,0);
mat[2] = lapack_make_complex_double(1,0);
mat[3] = lapack_make_complex_double(-1,0);
vec[0] = lapack_make_complex_double(1,0);
vec[1] = lapack_make_complex_double(1,0);
LAPACKE_zgetrf(LAPACK_ROW_MAJOR, 2, 2, mat, 2, p);
LAPACKE_zgetrs(LAPACK_ROW_MAJOR, 'N', 2, 1, mat, 2, p, vec, 2);
printf("%g %g\n", lapack_complex_double_real(vec[0]),
lapack_complex_double_imag(vec[0]));
return 0;
}
For some reasons, this causes illegal memory access in LAPACKE_zgetrs (as detected by valgrind and by my big program crashing in zgetrs because of "glibc detected corruption or double free"). I did not include this in my SSCCE for brevity, but all LAPACKE routines that return, return 0.
The same code with LAPACK_COL_MAJOR runs and valgrinds flawlessly.
My lapacke, lapack etc. is self-built for Ubuntu 12.04. I used the following settings in the lapack CMake file:
BUILD_COMPLEX ON
BUILD_COMPLEX16 ON
BUILD_DOUBLE ON
BUILD_SHARED_LIBS ON
BUILD_SINGLE ON
BUILD_STATIC_LIBS ON
BUILD_TESTING ON
CMAKE_BUILD_TYPE Release
LAPACKE ON
LAPACKE_WITH_TMG ON
and the rest (the optimized blas/lapack and xblas) off. There were no errors during the build and all tests succeeded.
Where did I mess up?
Edit: I just tried this with Fedora21 and the packaged lapacke. It did not reproduce the error.
Edit 2: While it does not reproduce the memory fails, it produces a wrong solution, namely (1 + 0I, 1 + 0I) for the above input (should be (1,0))
After some more research and overthinking things, I found the culprit:
Using LAPACK_ROW_MAJOR switches the meaning of the ld* leading dimension parameters. While the leading dimension of a normal Fortran array is the numbers of rows, switching to ROW_MAJOR switches its meaning to the number of columns. So the correct calls (giving correct results) would be:
LAPACKE_zgetrs(LAPACK_ROW_MAJOR, 'N', 2, 1, mat, 2, p, vec, 1);
where the second 2 is the number of columns (not rows!) of mat, and the last parameter must equal the number of right hand sides nrhs (not the number of variables!). I isolated this very call because all the other calls in my project dealt with square matrices, so the "wrong" calls do not have any negative effect due to symmetry.
As usual, if you are skipping columns at the end, the leading dimensions get bigger accordingly, as they would with skipping rows in the normal setting.
Obviously, this is not mentioned in the Fortran documentations. Unfortunately, I did see no such remark in the Lapacke documentation, which would have saved me a couple of hours of my life. :)

computing fft and ifft with fftw.h in C

Hi all
I am using the fftw C libraries to compute the frequency spectrum for some signal processing applications on embedded systems. However, in my project I have run into a slight hinderence.
Below is a simple program I wrote to ensure I am implementing the fftw functions correctly. Basically I want to calculate the fft of a sequence of 12 numbers, then do the ifft and obtain the same sequence of numbers again. If you have fftw3 and gcc installed this program should work if you compile with:
gcc -g -lfftw3 -lm fftw_test.c -o fftw_test
Currently my fft length is the same size as the input array.
#include <stdio.h>
#include <stdlib.h>
#include <sndfile.h>
#include <stdint.h>
#include <math.h>
#include <fftw3.h>
int main(void)
{
double array[] = {0.1, 0.6, 0.1, 0.4, 0.5, 0, 0.8, 0.7, 0.8, 0.6, 0.1,0};
//double array2[] = {1, 6, 1, 4, 5, 0, 8, 7, 8, 6, 1,0};
double *out;
double *err;
int i,size = 12;
fftw_complex *out_cpx;
fftw_plan fft;
fftw_plan ifft;
out_cpx = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*size);
out = (double *) malloc(size*sizeof(double));
err = (double *) malloc(size*sizeof(double));
fft = fftw_plan_dft_r2c_1d(size, array, out_cpx, FFTW_ESTIMATE); //Setup fftw plan for fft
ifft = fftw_plan_dft_c2r_1d(size, out_cpx, out, FFTW_ESTIMATE); //Setup fftw plan for ifft
fftw_execute(fft);
fftw_execute(ifft);
//printf("Input: \tOutput: \tError:\n");
printf("Input: \tOutput:\n");
for(i=0;i<size;i++)
{
err[i] = abs(array[i] - out[i]);
printf("%f\t%f\n",(array[i]),out[i]);
//printf("%f\t%f\t%f\n",(array[i]),out[i],err[i]);
}
fftw_destroy_plan(fft);
fftw_destroy_plan(ifft);
fftw_free(out_cpx);
free(err);
free(out);
return 0;
}
Which Produces the following output:
Input: Output:
0.100000 1.200000
0.600000 7.200000
0.100000 1.200000
0.400000 4.800000
0.500000 6.000000
0.000000 0.000000
0.800000 9.600000
0.700000 8.400000
0.800000 9.600000
0.600000 7.200000
0.100000 1.200000
0.000000 0.000000
So obviously the ifft is producing some scaled up result. In the fftw docs found here:
fftw docs about scaling.
It mentions about some scaling, however I am using the "r2c" and "c2r" transforms rather than the FFT_FORWARD and FFT_BACKWARD. Any insight would be appreciated.
Looking at the great documentation for the functions you use, you will see you are using FFT_FORWARD and FFT_BACKWARD, and exactly where it is intended. Therefore, the scaling information you found previously also applies here.
Sorry to be pedantic, but your size for out_cpx is incorrect. instead of being size long, it should be size/2 + 1. This is because FFT of a real signal is Hermitian. You can verify what I say by initializing out_cpx to some random number (all 3.14159). Run both the forward and backward and then print out out_cpx from size/2 + 1 to size. It will not have changed.
http://www.fftw.org/fftw3_doc/Real_002ddata-DFT-Array-Format.html#Real_002ddata-DFT-Array-Format
r2c and c2r do essentially the same as the regular Fourier transform. The only difference is that both the input and output array need to hold half of the numbers. Please take a look at the last paragraph of the manual of FFTW r2c and c2r. So the normalization factor is precisely the number of elements of the real array, or the variable size (== 12) in your case.

Resources