I am implementing Strassen's matrix multiplication algorithm as a part of an assignment. I have coded it correctly but I don't know why it is giving segmentation fault.
I have called strassen() as strassen(0,n,0,n); in main. n is a number given by user which is power of two and it is the maximum size of the matrix (2D Array).
It is not giving segfault for n=4 but for n=8,16,32, it is giving segfaults.
Code is as given below.
void strassen(int p, int q, int r, int s)
{
int p1,p2,p3,p4,p5,p6,p7;
if(((q-p) == 2)&&((s-r) == 2))
{
p1 = ((a[p][r] + a[p+1][r+1])*(b[p][r] + b[p+1][r+1]));
p2 = ((a[p+1][r] + a[p+1][r+1])*b[p][r]);
p3 = (a[p][r]*(b[p][r+1] - b[p+1][r+1]));
p4 = (a[p+1][r+1]*(b[p+1][r] - b[p][r]));
p5 = ((a[p][r] + a[p][r+1])*b[p+1][r+1]);
p6 = ((a[p+1][r] - a[p][r])*(b[p][r] +b[p][r+1]));
p7 = ((a[p][r+1] - a[p+1][r+1])*(b[p+1][r] + b[p+1][r+1]));
c[p][r] = p1 + p4 - p5 + p7;
c[p][r+1] = p3 + p5;
c[p+1][r] = p2 + p4;
c[p+1][r+1] = p1 + p3 - p2 + p6;
}
else
{
strassen(p, q/2, r, s/2);
strassen(p, q/2, s/2, s);
strassen(q/2, q, r, s/2);
strassen(q/2, q, s/2, s);
}
}
Some of the conditions in your else block are infinitely recursive (at least the second and the fourth, didn't checked the other). This can be easily proved with pen and paper:
e.g.
strassen(p, q/2, s/2, s) for `0,8,0,8 will yield at each iteration:
1) 0, 4, 4, 8
2) 0, 2, 4, 8
3) 0, 1, 4, 8
4) 0, 0, 4, 8
5) 0, 0, 4, 8
...
and since none of those results pass your
if(((q-p) == 2)&&((s-r) == 2))
test, the function will run (and I suspect branch, as the 4th function has the same problem...) until the end of the stack is hit, causing a Segmentation Fault.
Anyway, if what you are trying to do in the else block is to recursively bisect the matrix, a better attempt would be something like:
strassen(p, (q+p)/2, r, (r+s)/2);
strassen(p, (q+p)/2, (r+s)/2, s);
strassen((q+p)/2,q, (r+s)/2, s);
strassen((q+p)/2,q, r, (r+s)/2);
(keep in mind that I didn't check this code, though)
void strassen(int p, int q, int r, int s)
{
int p1,p2,p3,p4,p5,p6,p7;
if(q-p == 2 && s-r == 2)
{
p1 = (a[p][r] + a[p+1][r+1]) * (b[p][r] + b[p+1][r+1]);
p2 = (a[p+1][r] + a[p+1][r+1]) * b[p][r];
p3 = a[p][r] * (b[p][r+1] - b[p+1][r+1]);
p4 = a[p+1][r+1] * (b[p+1][r] - b[p][r]);
p5 = (a[p][r] + a[p][r+1]) * b[p+1][r+1];
p6 = (a[p+1][r] - a[p][r]) * (b[p][r] +b[p][r+1] );
p7 = (a[p][r+1] - a[p+1][r+1]) * (b[p+1][r] + b[p+1][r+1]);
c[p][r] = p1 + p4 - p5 + p7;
c[p][r+1] = p3 + p5;
c[p+1][r] = p2 + p4;
c[p+1][r+1] = p1 + p3 - p2 + p6;
}
else
{
if (q/2-p >= 2 && s/2-r >= 2) strassen(p, q/2, r, s/2);
if (q/2-p >= 2 && s-s/2 >= 2) strassen(p, q/2, s/2, s);
if (q-q/2 >= 2 && s/2-r >= 2) strassen(q/2, q, r, s/2);
if (q-q/2 >= 2 && s-s/2 >= 2) strassen(q/2, q, s/2, s);
}
}
But an easier recursion stopper would be at the beginning of the function, like:
{
int p1,p2,p3,p4,p5,p6,p7;
if(q-p < 2 || s-r < 2) return;
if(q-p == 2 && s-r == 2)
{ ...
Related
I want to count if is the number of the row on the left group to make the configuration acceptable by the procedures. Output -1 if there is no acceptable configuration. the procedures is left a(X) middle b(x+1) and right c(x+2). does anyone have a better solution than mine?
#include<stdio.h>
int main(void)
{
int chairs,a,b,c,result;
scanf("%d %d %d %d", &chairs,&a ,&b , &c);
for(int i=1; i<=chairs; i++)
{
result= (a*(i)) + (b*(i+1)) + (c*(i+2));
if(chairs == result)
{
printf("%d", i);
break;
}
else if(i == chairs && chairs != result)
printf("-1");
}
}
This is rather a math problem.
a * x + b * (x + 1) + c * (x + 2) = chair
a * x + b * x + b + c * x + 2 * c = chair
a * x + b * x + c * x = chair - b - 2 * c
x * (a + b + c) = chair - b - 2 * c
x = (chair - b - 2 * c) / (a + b + c)
This can be solved in 1 operation. No solution if a + b + c == 0
It boils down to a math question
where
if ( (a+b+c) == 0) return -1 ;
X = (Chairs - b - 2c) / (a+b+c) , Y = (Chairs - b - 2c) % (a+b+c)
if X > 0 && X <= Chairs && y == 0
return X ;
else
return -1 ;
This question already has answers here:
Pointer Arithmetic In C
(2 answers)
Pointer subtraction confusion
(8 answers)
Closed 4 years ago.
int vector[] = { 28, 41, 7 };
int *p0 = vector;
int *p1 = vector + 1;
int *p2 = vector + 2;
I know result of
printf("%p, %p, %p\n", p0, p1, p2);
is ex) 100, 104, 108
but why is the result of
printf("p2-p0: %d\n", p2 - p0);
printf("p2-p1: %d\n", p2 - p1);
printf("p0-p1: %d\n", p0 - p1);
is 2, 1, -1
not 8, 4, -4????????
when you subtract to pointers (of the same type else no sense) that computes the difference as indexes, not the difference of the addresses :
type * p1 = ...;
type * p2 = ...;
(p1 - p2) == (((char *) p1) - ((char *) p2)) / sizeof(type)
It is the same when you do vector + n, that gives the address of the element rank n, not ((char *) vector) + n. So
type * p = ...;
int n = ...;
((char *) (p + n)) == (((char *) p) + n * sizeof(type))
How can I do multiply two numbers without the '*' or '/' operator
It's different from the other question because I need float too
I need to consider negative with positive, real with fraction, every possibility of multiply
I have an Idea that I will do the real number first (5, 3, 8 and etc...) and then the fraction(0.5, 0.33333, and etc...)
float fraction = (float)x - (int)x; // I can calculate the fraction with this
I think I can "multiply" the fraction to be int and then do the multiply with real numbers but the problem is how I return it back how it was (divide by ten without divide)
With this (do this function until the number return is bigger then 0):
float multiBy10(float a)
{
float backup = a;
for(int i = 0; i < 10; i++, a += backup);
return a;
}
I saw this here but it with int and bit manipulation don't work on float number
int divs10(int n)
{
int q, r;
n = n + (n>>31 & 9);
q = (n >> 1) + (n >> 2);
q = q + (q >> 4);
q = q + (q >> 8);
q = q + (q >> 16);
q = q >> 3;
r = n - q*10;
return q + ((r + 6) >> 4);
// return q + (r > 9);
}
I am trying to implement a Navier-Stokes solver in 2D using CUDA. I am using Jacobi's method to solve the system of difference equations. I am dividing the code in 4x4 blocks consisting of 16x16 threads. As every inner point in my matrix (of dimension 64x64) requires its top, bottom, left and right element to compute its new value, I create a new shared matrix of 18x18 dimension for every block. I read all the values into the matrix in this fashion - The thread with indices (0, 0) will write its value into the (1, 1) element in the matrix and will also attempt to read the element above it and the one to its left if this access is not exceeding the boundary. Once this read is done, I update the values of all the internal points and then write them back into memory.
I end up getting garbage values in the matrix pn, even though all the values are initialized correctly. I honestly cannot see where I'm going wrong. Can someone help me with this?
My kernel -
__global__ void red_psi (float *psi_o, float *psi_n, float *e, float *omega, float l1)
{
// m = n = 64
int i1 = blockIdx.x;
int j1 = blockIdx.y;
int i2 = threadIdx.x;
int j2 = threadIdx.y;
int i = (i1 * blockDim.x) + i2; // Actual row of the element
int j = (j1 * blockDim.y) + j2; // Actual column of the element
int l = i * n + j;
// e_XX --> variables refers to expanded shared memory location in order to accomodate halo elements
//Current Local ID with radius offset.
int e_li = i2 + 1;
int e_lj = j2 + 1;
// Variable pointing at top and bottom neighbouring location
int e_li_prev = e_li - 1;
int e_li_next = e_li + 1;
// Variable pointing at left and right neighbouring location
int e_lj_prev = e_lj - 1;
int e_lj_next = e_lj + 1;
__shared__ float po[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
__shared__ float pn[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
__shared__ float oo[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
//__shared__ float ee[BLOCK_SIZE + 2][BLOCK_SIZE + 2];
if (i2 < 1) // copy top and bottom halo
{
//Copy Top Halo Element
if (blockIdx.y > 0) // Boundary check
{
po[i2][e_lj] = psi_o[l - n];
//pn[i2][e_lj] = psi_n[l - n];
oo[i2][e_lj] = omega[l - n];
//printf ("i_pn[%d][%d] = %f\n", i2, e_lj, oo[i2][e_lj]);
}
//Copy Bottom Halo Element
if (blockIdx.y < (gridDim.y - 1)) // Boundary check
{
po[1 + BLOCK_SIZE][e_lj] = psi_o[l + n];
//pn[1 + BLOCK_SIZE][e_lj] = psi_n[l + n];
oo[1 + BLOCK_SIZE][e_lj] = omega[l + n];
//printf ("j_pn[%d][%d] = %f\n", 1 + BLOCK_SIZE, e_lj, oo[1 + BLOCK_SIZE][e_lj]);
}
}
if (j2 < 1) // copy left and right halo
{
if (blockIdx.x > 0) // Boundary check
{
po[e_li][j2] = psi_o[l - 1];
//pn[e_li][j2] = psi_n[l - 1];
oo[e_li][j2] = omega[l - 1];
//printf ("k_pn[%d][%d] = %f\n", e_li, j2, oo[e_li][j2]);
}
if (blockIdx.x < (gridDim.x - 1)) // Boundary check
{
po[e_li][1 + BLOCK_SIZE] = psi_o[l + 1];
//pn[e_li][1 + BLOCK_SIZE] = psi_n[l + 1];
oo[e_li][1 + BLOCK_SIZE] = omega[l + 1];
//printf ("l_pn[%d][%d] = %f\n", e_li, 1 + BLOCK_SIZE, oo[e_li][BLOCK_SIZE + 1]);
}
}
// copy current location
po[e_li][e_lj] = psi_o[l];
//pn[e_li][e_lj] = psi_n[l];
oo[e_li][e_lj] = omega[l];
//printf ("o_pn[%d][%d] = %f\n", e_li, e_lj, oo[e_li][e_lj]);
__syncthreads ();
// Checking whether we have an internal point.
if ((i >= 1 && i < (m - 1)) && (j >= 1 && j < (n - 1)))
{
//printf ("Calculating for - (%d, %d)\n", i, j);
pn[e_li][e_lj] = 0.25 * (po[e_li_next][e_lj] + po[e_li_prev][e_lj] + po[e_li][e_lj_next] + po[e_li][e_lj_prev] + h*h*oo[e_li][e_lj]);
//printf ("n_pn[%d][%d] (%d, %d), a(%d, %d) = %f\n", e_li_prev, e_lj, i1, j1, i, j, po[e_li_prev][e_lj]);
pn[e_li][e_lj] = po[e_li][e_lj] + 1.0 * (pn[e_li][e_lj] - po[e_li][e_lj]);
__syncthreads ();
psi_n[l] = pn[e_li][e_lj];
e[l] = po[e_li][e_lj] - pn[e_li][e_lj];
}
}
This is how I invoke the kernel -
dim3 threadsPerBlock (4, 4);
dim3 numBlocks (4, 4);
red_psi<<<numBlocks, threadsPerBlock>>> (d_xn, d_xx, d_e, d_w, l1);
(d_xx, d_xn, d_e, d_w are all float arrays of size 4096)
I switched the blockDim.x and blockDim.y when I was copying the top / bottom and the left / right halo elements.
#include <stdio.h>
int main()
{
int n;
while ( scanf( "%d", &n ) != EOF ) {
double sum = 0,k;
if( n > 5000000 || n<=0 ) //the judgment of the arrange
break;
for ( int i = 1; i <= n; i++ ) {
k = (double) 1 / i;
sum += k;
}
/*
for ( int i = n; i > 0; i-- ) {
k = 1 / (double)i;
sum += k;
}
*/
printf("%.12lf\n", sum);
}
return 0;
}
Why in the different loop I get the different answer. Is there a float-error? When I input 5000000 the sum is 16.002164235299 but as I use the other loop of for (notation part) I get the sum 16.002164235300.
Because floating point math is not associative:
i.e. (a + b) + c is not necessarily equal to a + (b + c)
I also bumped into a + b + c issue. Totally agreed with ArjunShankar.
// Here A != B in general case
float A = ( (a + b) + c) );
float B = ( (a + c) + b) );
Most of floating point operations are performed with data loss in mantis, even when components are fit well in it (numbers like 0.5 or 0.25).
In fact I was quite happy to find out the cause of bug in my application. I have written short reminder article with detailed explanation:
http://stepan.dyatkovskiy.com/2018/04/machine-fp-partial-invariance-issue.html
Below is the C example. Good luck!
example.c
#include <stdio.h>
// Helpers declaration, for implementation scroll down
float getAllOnes(unsigned bits);
unsigned getMantissaBits();
int main() {
// Determine mantissa size in bits
unsigned mantissaBits = getMantissaBits();
// Considering mantissa has only 3 bits, we would then get:
// a = 0b10 m=1, e=1
// b = 0b110 m=11, e=1
// c = 0b1000 m=1, e=3
// a + b = 0b1000, m=100, e=1
// a + c = 0b1010, truncated to 0b1000, m=100, e=1
// a + b + c result: 0b1000 + 0b1000 = 0b10000, m=100, e=2
// a + c + b result: 0b1000 + 0b110 = 0b1110, m=111, e=1
float a = 2,
b = getAllOnes(mantissaBits) - 1,
c = b + 1;
float ab = a + b;
float ac = a + c;
float abc = a + b + c;
float acb = a + c + b;
printf("\n"
"FP partial invariance issue demo:\n"
"\n"
"Mantissa size = %i bits\n"
"\n"
"a = %.1f\n"
"b = %.1f\n"
"c = %.1f\n"
"(a+b) result: %.1f\n"
"(a+c) result: %.1f\n"
"(a + b + c) result: %.1f\n"
"(a + c + b) result: %.1f\n"
"---------------------------------\n"
"diff(a + b + c, a + c + b) = %.1f\n\n",
mantissaBits,
a, b, c,
ab, ac,
abc, acb,
abc - acb);
return 1;
}
// Helpers
float getAllOnes(unsigned bits) {
return (unsigned)((1 << bits) - 1);
}
unsigned getMantissaBits() {
unsigned sz = 1;
unsigned unbeleivableHugeSize = 1024;
float allOnes = 1;
for (;sz != unbeleivableHugeSize &&
allOnes + 1 != allOnes;
allOnes = getAllOnes(++sz)
) {}
return sz-1;
}