Find inverse of matrix using b^nth power - c

I've searched for hours and spent many more trying to figure how to fix this problem. I need to find the inverse of a predefined matrix using
A^-1 = I + (B + B^2 + ... + B^20) where B = I-A.
void invA(double a[][3], double id[][3], double z[][3])
{
int i, j, n, k;
double pb[3][3] = {1.,0.,0.,0.,1.,0.,0.,0.,1.};
double temp[3][3] = {1.,0.,0.,0.,1.,0.,0.,0.,1.};
double b[3][3];
temp[i][j] = 0;
b[i][j] = 0;
for(i = 0; i < 3; i++)
for (j = 0; j < 3; j++)
b[i][j] = id[i][j] - a[i][j];
for (n = 0; n < 20; n++) //run loop n times
{
for (i = 0; i < 3; i++) //find b to the power 20
for (j = 0; j < 3; j++)
for (k = 0; k < 3; k++)
temp[i][j] += pb[i][k] * b[k][j];
for (i = 0; i < 3; i++) //allocate pb from temp
for (j = 0; j < 3; j++)
pb[i][j] = temp[i][j];
for (i = 0; i < 3; i++) //summing b n time
for (j = 0; j < 3; j++) //to find inverse
z[i][j] = z[i][j] + pb[i][j];
}
}
Matrix a is the defined matrix, id is the identity and z is the inverse (result). I can't seem to figure out where I've gone wrong.

You have few problems.
First, temp[i][j] = 0; and b[i][j] = 0; at the beginning of the function use uninitialized variables i and j. The behaviour is undefined, and who knows how temp is actually initialized.
Then, temp must be reinitialized to a zero matrix at each iteration. I don't know what exactly does your code compute, but it is not a power for sure.
Finally, (unless z is initialized to I), you are missing the initial term.
All that said, I highly recommend to factor out most of the loops into functions: matAdd() and matMult(). Once they are unit tested, the rest is much simpler.

Related

How to convert this C code into Assembly using intrinsics?

this is my first time using intrisics and I have to convert the C code below into C code that uses intrisics for assembly. I don't know where to start.
void slow_routine(float alpha, float beta){
unsigned int i,j;
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
A[i][j] = A[i][j] + u1[i] * v1[j] + u2[i] * v2[j];
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
x[i] = x[i] + beta * A[j][i] * y[j];
for (i = 0; i < N; i++)
x[i] = x[i] + z[i];
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
w[i] = w[i] + alpha * A[i][j] * x[j];
}
Any of the following can be used: x86-64 SSE/SSE2/SSE4/AVX/AVX2
Please help I have been trying for hours and I'm very stuck.

Why does my code return -nan in visual studio, but not in Linux?

My Gauss Elimination code's results are -nan in visual studio, but not in Linux.
And the Linux results are awful because at func Gauss_Eli how many I increase the variable k at for blocks the func is working... doesn't occur segment error.
What is wrong with my code?
float ** Gauss_Eli(float ** matrix, int n) {
// -----------------------------------------------------
// | |
// | Eliminate elements except (i, i) element |
// | |
// -----------------------------------------------------
// Eliminate elements at lower triangle part
for (int i = 0; i < n; i++) {
for (int j = i + 1; j < n; j++) {
for (int k = 0; k < n + 1; k++) {
float e;
e = matrix[i][k] * (matrix[j][i] / matrix[i][i]);
matrix[j][k] -= e;
}
}
}
// Eliminate elements at upper triangle part
for (int i = n - 1; i >= 0; i--) {
for (int j = i - 1; j >= 0; j--) {
for (int k = 0; k < n + 1; k++) {
float e;
e = matrix[i][k] * (matrix[j][i] / matrix[i][i]);
matrix[j][k] -= e;
}
}
}
// Make 1 elements i, i
for (int i = 0; i < n; i++)
for (int j = 0; j < n + 1; j++) matrix[i][j] /= matrix[i][i];
return matrix;
}
int main() {
float ** matrix;
int n;
printf("Matrix Size : ");
scanf("%d", &n);
// Malloc variable matrix for Matrix
matrix = (float**)malloc(sizeof(float) * n);
for (int i = 0; i < n; i++) matrix[i] = (float*)malloc(sizeof(float) * (n + 1));
printf("Input elements : \n");
for (int i = 0; i < n; i++)
for (int j = 0; j < n + 1; j++) scanf("%f", &matrix[i][j]);
matrix = Gauss_Eli(matrix, n);
printf("Output result : \n");
//Print matrix after elimination
for (int i = 0; i < n; i++) {
for (int j = 0; j < n + 1; j++) printf("%.6f ", matrix[i][j]);
printf("\n");
}
return 0;
}
1.) OP allocates memory using the wrong type. This may lead to issues of insufficient memory and all sorts of UB and explain the difference between systems as they could have differing pointer and float sizes.
float ** matrix;
// v--- wrong type
// matrix = (float**)malloc(sizeof(float) * n);
Instead allocate to the size of the referenced variable. Easier to code (and get right), review and maintain.
matrix = malloc(sizeof *matrix * n);
if (matrix == NULL) Handle_Error();
2.) Code should look for division by 0.0
//for (int k = 0; k < n + 1; k++) {
// float e;
// e = matrix[i][k] * (matrix[j][i] / matrix[i][i]);
// matrix[j][k] -= e;
//}
if (matrix[i][i] == 0.0) Handle_Error();
float m = matrix[j][i] / matrix[i][i];
for (int k = 0; k < n + 1; k++) {
matrix[j][k] -= matrix[i][k]*m;
}
3.) General problem solving tips:
Check return values of scanf("%f", &matrix[i][j]);. It is 1?
Enable all warnings.
Especially for debug, print FP using "%e" rather than "%f".
4.) Numerical analysis tip: Insure exact subtraction when i==j
if (i == j) {
for (int k = 0; k < n + 1; k++) {
matrix[j][k] = 0.0;
}
else {
if (matrix[i][i] == 0.0) Handle_Divide_by_0();
float m = matrix[j][i] / matrix[i][i];
for (int k = 0; k < n + 1; k++) {
matrix[j][k] -= matrix[i][k]*m;
}
}

OPenMP: When can a Loop be parallelized

For each of the following code segments, use OpenMP pragmas to make the loop parallel, or
explain why the code segment is not suitable for parallel execution.
a. for (i = 0; i < sqrt(x); i++)
a[i] = 2.3 * i;
if (i < 10)
b[i] = a[i];
}
b. flag = 0;
for (i = 0; i < n && !flag; i++)
a[i] = 2.3 * i;
if (a[i] < b[i])
flag = 1;
}
c. for (i = 0; i < n && !flag; i++)
a[i] = foo(i);
d. for (i = 0; i < n && !flag; i++) {
a[i] = foo(i);
if (a[i] < b[i])
a[i] = b[i];
}
e. for (i = 0; i < n && !flag; i++) {
a[i] = foo(i);
if (a[i] < b[i])
break;
}
f. dotp = 0;
for (i = 0; i < n; i++)
dotp += a[i] * b[i];
g. for (i = k; i < 2 * k; i++)
a[i] = a[i] + a[i – k];
h. for (i = k; i < n; i++) {
a[i] = c * a[i – k];
Any help regarding the above question would be very much welcome..any line of thinking..
I will not do your HW, but I will give a hint. When playing around with OpenMp for loops, you should be alert about the scope of the variables. For example:
#pragma omp parallel for
for(int x=0; x < width; x++)
{
for(int y=0; y < height; y++)
{
finalImage[x][y] = RenderPixel(x,y, &sceneData);
}
}
is OK, since x and y are private variables.
What about
int x,y;
#pragma omp parallel for
for(x=0; x < width; x++)
{
for(y=0; y < height; y++)
{
finalImage[x][y] = RenderPixel(x,y, &sceneData);
}
}
?
Here, we have defined x and y outside of the for loop. Now consider y. Every thread will access/write it without any synchronization, thus data races will occur, which are very likely to result in logical errors.
Read more here and good luck with your HW.

% symbols keep printing when trying to print out a char 2D array in C

I'm new to C and I'm just trying to print out a two 2 array.
This bug has been annoying me all day and I'm not really sure whats going on.
#include<stdio.h>
void run(int);
main()
{
run(5);
return 0;
}
//Have to make it a character array as it needs to
//store numbers AND commas.
run(int x)
{
int size = 2*x -1;
char array[size][size];
int i = 0;
int j = 0;
for( i; i < size; i++){
for(j; j< size; j++){
array[i][j] = '1';
}
}
int k = 0;
int l = 0;
for( k; k < size; k++){
for(l; l< size; l++){
printf( "%c" , array[l][k]);
}
printf("%\n", "");
}
}
This is the output I get:
1%
%
%
%
%
%
%
%
%
You code has several mistakes:
The biggest problem is that your not initializing your loop counters where you should:
for(i; i < size; i++){
for(j; j < size; j++){
With that, i & j are left as they were prior to the for statement. The first section of these statements does nothing at all. While that's harmless for i (since it's initialized to 0 before the for), that's devastating for j, which never goes back to 0. Your code should be:
for(i = 0; i < size; i++){
for(j = 0; j < size; j++){
The same issue exists with k & l, and the same fix should be applied:
for(k = 0; k < size; k++){
for(l = 0; l < size; l++){
Next, you're "rotating" access in your array. When you fill the array with values, you have i in your outer loop and j in the inner loop, and you use them as [i][j]:
array[i][j] = '1';
Think of that as Out & In --> [Out][In].
When you print the array, you "rotate" that, k is outer & l is inner, and you use them as [l][k]:
printf("%c", array[l][k]);
That's like doing [In][Out].
While that's not a problem with all values being identical ('1'), and the matrix being square (width == height), it won't work with other values or dimensions, and is confusing.
Last, you're attempt to print a new line is wrong. You have a % specifier, but your not really using any valid character after that, and you don't need that anyway, just print:
printf("\n");
So, all together, here's what the code should be:
run(int x)
{
int size = 2*x -1;
char array[size][size];
int i,j;
for(i = 0; i < size; i++){
for(j = 0; j < size; j++){
array[i][j] = '1';
}
}
int k, l;
for(k = 0; k < size; k++){
for(l = 0; l < size; l++){
printf("%c", array[k][l]);
}
printf("\n");
}
}
(And as a side note, k & l are not really required, you can simply reuse i & j)

Remove conditions in a C program for speedup

I have a number crunching C program which involves a main loop with two conditionals:
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
for (k = 0; k < N; k++) {
if (k == i || k == j) continue;
...(calculate a, b, c, d (depending on k)
if (a*a + b*b + c*c < d*d) {break;}
} //k
} //j
} //i
The hardware here is the SPE of the Cell processor, where there is a big penalty when using branching. So in order to optimize my program for speedup I need to remove these 2 conditionals, do you know about good strategies for this?
For the first one, you could break it into multiple loops, eg change:
for(int i = 0; i < 1000; i++)
for(int j = 0; j < 1000; j++) {
for(int k = 0; k < 1000; k++) {
if(k==i || k == j) continue;
// other code
}
}
to:
for(int i = 0; i < 1000; i++)
for(int j = 0; j < 1000; j++) {
for(int k = 0; k < min(i, j); k++) {
// other code
}
for(int k = min(i, j) + 1; k < max(i, j); k++) {
// other code
}
for(int k = max(i, j) + 1; k < 1000; k++) {
// other code
}
}
To remove the second, you could store the previous total and use it in the for loop conditions, i.e.:
int left_side = 1, right_side = 0;
for(int i = 0; i < N; i++)
for(int j = 0; j < N; j++) {
for(int k = 0; k < min(i, j) && left_side >= right_side; k++) {
// other code (calculate a, b, c, d)
left_side = a * a + b * b + c * c;
right_side = d * d;
}
for(int k = min(i, j) + 1; k < max(i, j) && left_side >= right_side; k++) {
// same as in previous loop
}
for(int k = max(i, j) + 1; k < N && left_side >= right_side; k++) {
// same as in previous loop
}
}
Implementing min and max without branching could also be tricky. Maybe this version is better:
int i, j, k,
left_side = 1, right_side = 0;
for(i = 0; i < N; i++) {
// this loop covers the case where j < i
for(j = 0; j < i; j++) {
k = 0;
for(; k < j && left_side >= right_side; k++) {
// other code (calculate a, b, c, d)
left_side = a * a + b * b + c * c;
right_side = d * d;
}
k++; // skip k == j
for(; k < i && left_side >= right_side; k++) {
// same as in previous loop
}
k++; // skip k == i
for(; k < N && left_side >= right_side; k++) {
// same as in previous loop
}
}
j++; // skip j == i
// and now, j > i
for(; j < N; j++) {
k = 0;
for(; k < i && left_side >= right_side; k++) {
// other code (calculate a, b, c, d)
left_side = a * a + b * b + c * c;
right_side = d * d;
}
k++; // skip k == i
for(; k < j && left_side >= right_side; k++) {
// same as in previous loop
}
k++; // skip k == j
for(; k < N && left_side >= right_side; k++) {
// same as in previous loop
}
}
}
I agree with 'sje397'.
Besides this, you provide too little information about your problem. You say branching is pricey. But how often does it actually happen? Maybe your problem is that compiler-generated code does branching in the common scenario?
Perhaps you could re-arrange your if-s. The implementation of the if is actually compiler-dependent, bust many compilers treat it in a straight-forward way. That is: if - common - else - rare (jump).
Then try the following:
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
for (k = 0; k < N; k++) {
if (k != i && k != j)
{
...(calculate a, b, c, d)
if (a*a + b*b + c*c >= d*d)
{
...
} else
break;
}
} //k
} //j
} //i
EDIT:
Of course you may go into assembler level to ensure correct code generated.
I would look first at your calculate code, because that could swamp all these branching issues. Some sampling would find out for sure.
However, it looks like you're doing, for each i,j, a linear search for the first point inside a sphere. Could you have 3 arrays, one for each of the X, Y, and Z axes, and in each array store indexes of all the original points in ascending order by that axis? That could facilitate a nearest-neighbor search. Also, you might be able to use an in-cube test, rather than an in-sphere test, since you're not hunting for the closest point, but only a nearby point.
Are you sure you actually need the first if-statement? Even if it jumps one calculation when k equals i or j, the penalty for checking it every iteration is very costly. Also, keep in mind that if N is not a constant, the compiler probably wont be able to unroll the for loops.
Although, if it's a cell processor, the compiler might even try to vectorize the loops.
If the for loops compiles to normal iterative loops it could be an idea to make them compare with zero instead, as the decrement operation will often do the comparison for you when it hits zero.
for (i = 0; i < N; i++) {
...can become...
for (i = N; i != 0; i--) {
Although, if "i" is used as an index or a variable in a calculation, you might get performance degradation as you will get cache misses.

Resources