Arithmetic Exception
//m*n行列Aを用いてy = A*x +b を計算する
void fc(int m, int n, const float *x, const float *A, const float *b, float *y){
int i, j;
for (i = 0; i < m; i++){
y[i] = b[i];
for (j = 0; j < n; j++){
y[i] += A[i * n + j] * x[j];
}
}
}
This is a code that does AX+b calculation of matrixes.
But as in the photo, an arithmetic exception is occurred. Why is this happening? Even though it is multiplication and there is nothing divided by 0.
How can I solve this error?
Sorry that I cannot add the values, or else I will have to add the whole file here. These are the parameters of the Neural Network and I will have to add .dat files here then I will also need other codes that can load those files. Also, I do not know how to bring only numbers from the .dat files, they are kind of weirdly encoded, so.
I will provide all the other information otherwise, so please don't close this question and I really want to know why this happens and how to solve it.
This is also another example of the exception.
Example
What I want to know is how can this happen even where there is nothing divided by 0 in this example. How I can interpret this situation.
according to you image, your matrices size is 100x50 (m,n) that means 5000 items. but you entered A[j*m+i] where 'j' is equal with 55 and 'i' is equal with 0. that means accessing the 5500 item of array which is not allowed.
#include <stdio.h>
void fc(int m, int n, const float *x, const float *A, const float *b, float *y){
int i, j;
for (i = 0; i < m; i++){
y[i] = b[i];
for (j = 0; j < n; j++){
y[i] += A[i * n + j] * x[j];
}
}
}
int main()
{
const float x[3]={1,1,1};
const float *xp=x;
const float A[3][3]={{1,1,1},{1,1,1},{1,1,1}};
const float b[3]={1,1,1};
const float *bp=b;
float y[3]; float *yp=y;
fc (3,3,xp,*A,bp,yp);
printf("%f %f %f ",y[0],y[1],y[2]);
return 0;
}
I've tested the program with imaginary values of 1 for all variables and the matrices size of 3x3 and 3x1. the result was correct with no error. ther result was
4.0000000 4.0000000 4.0000000
So the problem does not arise from the structure of your code. it is definitely comes from a special arithmetic problem.
Related
I am fairly new to C and how arrays and memory allocation works. I'm solving a very simple function right now, vector_average(), which computes the mean value between two successive array entries, i.e., the average between (i) and (i + 1). This average function is the following:
void
vector_average(double *cc, double *nc, int n)
{
//#pragma omp parallel for
double tbeg ;
double tend ;
tbeg = Wtime() ;
for (int i = 0; i < n; i++) {
cc[i] = .5 * (nc[i] + nc[i+1]);
}
tend = Wtime() ;
printf("vector_average() took %g seconds\n", tend - tbeg);
}
My goal is to set int n extremely high, to the point where it actually takes some time to complete this loop (hence, why I am tracking wall time in this code). I'm passing this function a random test function of x, f(x) = sin(x) + 1/3 * sin(3 x), denoted in this code as x_nc, in main() in the following form:
int
main(int argc, char **argv)
{
int N = 1.E6;
double x_nc[N+1];
double dx = 2. * M_PI / N;
for (int i = 0; i <= N; i++) {
double x = i * dx;
x_nc[i] = sin(x) + 1./3. * sin(3.*x);
}
double x_cc[N];
vector_average(x_cc, x_nc, N);
}
But my problem here is that if I set int N any higher than 1.E5, it segfaults. Please provide any suggestions for how I might set N much higher. Perhaps I have to do something with malloc, but, again, I am new to all of this stuff and I'm not quite sure how I would implement this.
-CJW
A function only has 1M stack memory on Windows or other system. Obviously, the size of temporary variable 'x_nc' is bigger than 1M. So, you should use heap to save data of x_nc:
int
main(int argc, char **argv)
{
int N = 1.E6;
double* x_nc = (double*)malloc(sizeof(dounble)*(N+1));
double dx = 2. * M_PI / N;
for (int i = 0; i <= N; i++) {
double x = i * dx;
x_nc[i] = sin(x) + 1./3. * sin(3.*x);
}
double* x_cc = (double*)malloc(sizeof(double)*N);
vector_average(x_cc, x_nc, N);
free(x_nc);
free(x_cc);
return 0;
}
There is a serie for the exp function whitch looks like this:
exp(x) = (x^0)/0! + (x^1)/1! + (x^2)/2! + (x^3)/3! + ···. And I'm trying to compute it for different values of x, checking my results with a calculator and I found that for big values, 20 for example, my results stop increasing and get stuck in a value that is almost the real one. I get 485165184.00 and the real value is 485165195.4.
I must do this code in a for cycle or a recursive function, since it is a homework assignment.
My code looks as following
#include <stdio.h>
#define N 13
#define xi 3
double fun7(int n, int m){
int i;
double res=1, aux=0;
for(i=1, aux=1; i<(n+1); i++){
res += aux;
aux *= m;
aux /= i;
}
return res-1;
}
int main() {
int a, b, pot, x[xi];
float R[N][xi];
x[0] = 5;
x[1] = 10;
x[2] = 20;
for(b=0; b<xi; b++){
for (a=0, pot=1; a<N; a++){
R[a][b] = fun7(pot, x[b]);
pot *= 2;
}
}
for(b=0; b<xi; b++){
for (a=0, pot=1; a<N; a++){
printf("%d\t%f\n", pot, R[a][b]);
pot *= 2;
}
printf("\n");
}
return 0;
}
The float data type can normally represent numbers with a tad more than 7 decimal digits of precision.
485165184 has 9 decimal digits. The last two digits are just meaningless noise as far as float goes. You really should be showing 4.851652e8, which is the correct value for exp(20) with the given level of precision.
If you want to increase precision, try using double or long double data types.
Assume that the dimensions are very large (up to 1 billion elements in a matrix). How would I implement a cache oblivious algorithm for matrix-vector product? Based on wikipedia I will need to recursively divide and conquer however I feel like there would be a lot of overhead.. Would it be efficient to do so?
Follow up question and answer: OpenMP with matrices and vectors
So the answer to the question, "how do I make this basic linear algebra operation fast", is always and everywhere to find and link to a tuned BLAS library for your platform. Eg, GotoBLAS (whose work is being continued in OpenBLAS), or the slower autotuned ATLAS, or commercial packages like Intel's MKL. Linear algebra is so fundamental to so many other operations that enormous amounts of effort goes into optimizing these packages for various platforms, and there's just no chance you're going to come up with something in a few afternoon's work that will compete. The particular subroutine calls you're looking for for general dense matrix-vector multiplicaiton is SGEMV/DGEMV/CGEMV/ZGEMV.
Cache-oblivious algorithms, or autotuning, are for when you can't be bothered tuning for the specific cache architecture of your system - which might be fine, normally, but since people are willing to do that for BLAS routines, and then make the tuned results available, means that you're best off just using those routines.
The memory access pattern for GEMV is straightforward enough that you don't really need divide and conquer (same for the standard case of matrix transpose) - you just find the cache blocking size and use it. In GEMV (y = Ax), you still have to scan through the entire matrix once, so there's nothing to be done for reuse (and thus effective cache use) there, but you can try reuse x as much as possible so you load it once instead of (number of rows) times - and you still want access to A to be cache friendly. So the obvious cache blocking thing to do is to break along blocks:
A x -> [ A11 | A12 ] | x1 | = | A11 x1 + A12 x2 |
[ A21 | A22 ] | x2 | | A21 x1 + A22 x2 |
And you can certainly do that recursively. But doing a naive implementation, it's slower than the simple double-loop, and way slower than a proper SGEMV library call:
$ ./gemv
Testing for N=4096
Double Loop: time = 0.024995, error = 0.000000
Divide and conquer: time = 0.299945, error = 0.000000
SGEMV: time = 0.013998, error = 0.000000
The code follows:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include "mkl.h"
float **alloc2d(int n, int m) {
float *data = malloc(n*m*sizeof(float));
float **array = malloc(n*sizeof(float *));
for (int i=0; i<n; i++)
array[i] = &(data[i*m]);
return array;
}
void tick(struct timeval *t) {
gettimeofday(t, NULL);
}
/* returns time in seconds from now to time described by t */
double tock(struct timeval *t) {
struct timeval now;
gettimeofday(&now, NULL);
return (double)(now.tv_sec - t->tv_sec) + ((double)(now.tv_usec - t->tv_usec)/1000000.);
}
float checkans(float *y, int n) {
float err = 0.;
for (int i=0; i<n; i++)
err += (y[i] - 1.*i)*(y[i] - 1.*i);
return err;
}
/* assume square matrix */
void divConquerGEMV(float **a, float *x, float *y, int n,
int startr, int endr, int startc, int endc) {
int nr = endr - startr + 1;
int nc = endc - startc + 1;
if (nr == 1 && nc == 1) {
y[startc] += a[startr][startc] * x[startr];
} else {
int midr = (endr + startr+1)/2;
int midc = (endc + startc+1)/2;
divConquerGEMV(a, x, y, n, startr, midr-1, startc, midc-1);
divConquerGEMV(a, x, y, n, midr, endr, startc, midc-1);
divConquerGEMV(a, x, y, n, startr, midr-1, midc, endc);
divConquerGEMV(a, x, y, n, midr, endr, midc, endc);
}
}
int main(int argc, char **argv) {
const int n=4096;
float **a = alloc2d(n,n);
float *x = malloc(n*sizeof(float));
float *y = malloc(n*sizeof(float));
struct timeval clock;
double eltime;
printf("Testing for N=%d\n", n);
for (int i=0; i<n; i++) {
x[i] = 1.*i;
for (int j=0; j<n; j++)
a[i][j] = 0.;
a[i][i] = 1.;
}
/* naive double loop */
tick(&clock);
for (int i=0; i<n; i++) {
y[i] = 0.;
for (int j=0; j<n; j++) {
y[i] += a[i][j]*x[j];
}
}
eltime = tock(&clock);
printf("Double Loop: time = %lf, error = %f\n", eltime, checkans(y,n));
for (int i=0; i<n; i++) y[i] = 0.;
/* naive divide and conquer */
tick(&clock);
divConquerGEMV(a, x, y, n, 0, n-1, 0, n-1);
eltime = tock(&clock);
printf("Divide and conquer: time = %lf, error = %f\n", eltime, checkans(y,n));
/* decent GEMV implementation */
tick(&clock);
float alpha = 1.;
float beta = 0.;
int incrx=1;
int incry=1;
char trans='N';
sgemv(&trans,&n,&n,&alpha,&(a[0][0]),&n,x,&incrx,&beta,y,&incry);
eltime = tock(&clock);
printf("SGEMV: time = %lf, error = %f\n", eltime, checkans(y,n));
return 0;
}
I am writing a n x n matrix multiplication program in C where a[][] and b[][] are the inputs and x[][] is the output. a, b and x are malloc'd but I am unsure how to pass the pointers to the multiplication function correctly. below is an outline of what i am trying to do
void multiplication(float a, float b, float x, int n);
void main() {
float **a, **b, **x;
int n; // size of arrays (n x n)
multiplication(a, b, x, n);
}
void multiplication(float a, float b, float x, int n) {
// function
}
You want void multiplication(float *a, float *b, float *x, int n);. Note that generally you should use size_t for indexes and array sizes, and double as your preferred floating point type unless you have specific reason to use float.
Each of a, b and x point to contiguous float values, you will want to dereference these using a[n * x + y].
C99 introduces all kinds of interesting optimization possibilities on top of this, all of which you pretty much can't rely upon in any compiler I know of:
Variable Length Arrays in GCC
Arrays in XL C/C++ V7.0 (for AIX)
With those, something like this should be possible:
void multiplication(size_t len; // <- semicolon not a mistake
double a[len][restrict const len],
double b[len][restrict const len],
double c[len][restrict const len]);
This pedantic construction would indicate to the compiler that the length of the arrays are are the same, they're 2D, and the sizes are indicated from the calling code at runtime. Furthermore all the arrays are cacheable as they don't alias one another.
One can only dream that C continues to be advanced, C99 still isn't fully supported, and many other improvements haven't become mainstream.
you have to pass the address of first element of both matrix in multiplication method
actually the thing is that the elements of an array is arranged like queue means one element after another. so if you know the address of first element then you just increase the index number and you easily get all member of that array.
check this
it might be help you
Well, you must understand pointers for doing this kind of things in C. Here's a simple code:
int n = 10;
float * multiply ( float *a, float *b ) {
float *ans;
int i, j, k;
ans = malloc(sizeof(float)*n*n);
for (i=0; i<n; ++i)
for (j=0; j<n; ++j) {
ans[i*n+j] = 0.0;
for (k=0; k<n; ++k)
ans[i*n+j] += a[i*n+k] * b[k*n+j];
}
return ans;
}
int main() {
float *a, *b, *ans;
a = malloc(sizeof(float)*n*n);
input(&a);
b = malloc(sizeof(float)*n*n);
input(&b);
ans = multiply(a,b);
output(ans);
return 0;
}
If you have trouble understanding the code, please try to brush up your pointer skills. And you can always ask us.
Here is a nice easy way to pass dynamically allocated arrays to a function.
#include <stdio.h>
#include <stdlib.h>
void Function(int ***Array);
int main()
{
int i, j, k, n=10;
//Declare array, and allocate memory using malloc. ( Dimensions will be 10 x 10 x 10)
int ***Array=(int***)malloc(n*sizeof(int**));
for (i=0; i<n; i++)
{
Array[i]=(int**)malloc(n*sizeof(int*));
for (j=0; j<n; j++)
{
Array[i][j]=(int*)malloc(n*sizeof(int));
}
}
//Initialize array in a way that allows us to check it easily (i*j+k).
for (i=0; i<n; i++)
{
for (j=0; j<n; j++)
{
for (k=0; k<n; k++)
{
Array[i][j][k]=i*j+k;
}
}
}
//Check array is initialized correctly.
printf("%d\n", Array[4][5][6]);
//Pass array to Function.
Function(Array);
//Check that Function has correctly changed the element.
printf("%d\n", Array[4][5][6]);
return 0;
}
void Function(int ***Array)
{
//Check that Function can access values correctly.
printf("%d\n", Array[4][5][6]);
//Change an element.
Array[4][5][6]=1000;
}
I know this is not specific to your matrix multiplication, but it should demonstrate how to pass the array to the function. It is quite likely that your function would need to know the dimensions of the array, so pass those to it as well... void Function(int ***Array, int n) and call the function as Function(Array, n);
I'm generating two matrices using the following function (note some code is omitted):
srand(2007);
randomInit(h_A_data, size_A);
void randomInit(float* data, int size)
{
int i;
for (i = 0; i < size; ++i){
data[i] = rand() / (float)RAND_MAX;
}
}
This is called for matrix A and B. This populates the matrices with 0.something values, e.g. 0.748667. I then perform a matrix multiplication using a CPU. I compare the result to a GPU implementation via OpenCL. The resulting matrix has values in the range 20.something, e.g. 23.472757. Both the CPU and the GPU give the same result. The CPU implementation is taken from the Cuda toolkit distrib by nvidia:
void computeGold(float* C, const float* A, const float* B, unsigned int hA, unsigned int wA, unsigned int wB)
{
unsigned int i;
unsigned int j;
unsigned int k;
for (i = 0; i < hA; ++i)
for (j = 0; j < wB; ++j) {
double sum = 0;
for (k = 0; k < wA; ++k) {
double a = A[i * wA + k];
double b = B[k * wB + j];
sum += a * b;
}
C[i * wB + j] = (float)sum;
}
}
The weird thing is, all three matrices in memory are of the same size, i.e. sizeof(float)*size_A, or *size_B for matrix B etc. When I dump them to the disk, the file for the result stored in matrix C (the multiplied matrix) is bigger than matrix A and B.
Even more critical, for my application I'm transferring these over a network via a socket. In terms of the raw number of bytes, all matrices are the same, and yet it takes longer to transfer matrix C over the network. The problem is extrapolated for large matrix sizes. Why is this?
UPDATE/EDIT:
fprintf(matrix_c_file,"\n\nMatrix C\n");
for(i = 0; i < size_C; i++)
{
fprintf(matrix_c_file,"%f ", h_C_data[i]);
}
fprintf(matrix_c_file,"\n");
When matrix A and B contain only zero's, all three (matrix A, B and C) are the same size on disk.
I think that lijie has the correct (albeit terse) answer in the comments. The %f format specifier can result in a string with variable width. Consider the following C code:
printf("%f\n", 0.0);
printf("%f\n", 3.1415926535897932384626433);
printf("%f\n", 20.53);
printf("%f\n", 20.5e38);
which produces:
0.000000
3.141593
20.530000
2050000000000000019963732141023730597888.000000
All of the output has the same number of digits after the decimal point (6 by default), but a variable number to the left of the decimal point. If you need the textual representation of your matrix to be a consistent size and you don't mind sacrificing some precision, you can use the %e format specifier instead to force an exponential representation like 2.345e12.