multiply matrices in loop in C - c

I am attempting to multiply several matrices using a loop in C. I obtain the expected answer in R, but cannot obtain the expected answer in C. I suspect the problem is related to the += function which seems to double the value of the product after the first iteration of the loop.
I am not very familiar with C and have not been able to replace the += function with one that will return the expected answer.
Thank you for any advice.
First, here is the R code that returns the expected answer:
B0 = -0.40
B1 = 0.20
mycov1 = exp(B0 + -2 * B1) / (1 + exp(B0 + -2 * B1))
mycov2 = exp(B0 + -1 * B1) / (1 + exp(B0 + -1 * B1))
mycov3 = exp(B0 + 0 * B1) / (1 + exp(B0 + 0 * B1))
mycov4 = exp(B0 + 1 * B1) / (1 + exp(B0 + 1 * B1))
trans1 = matrix(c(1 - 0.25 - mycov1, mycov1, 0.25 * 0.80, 0,
0, 1 - 0.50, 0, 0.50 * 0.75,
0, 0, 1, 0,
0, 0, 0, 1),
nrow=4, ncol=4, byrow=TRUE)
trans2 = matrix(c(1 - 0.25 - mycov2, mycov2, 0.25 * 0.80, 0,
0, 1 - 0.50, 0, 0.50 * 0.75,
0, 0, 1, 0,
0, 0, 0, 1),
nrow=4, ncol=4, byrow=TRUE)
trans3 = matrix(c(1 - 0.25 - mycov3, mycov3, 0.25 * 0.80, 0,
0, 1 - 0.50, 0, 0.50 * 0.75,
0, 0, 1, 0,
0, 0, 0, 1),
nrow=4, ncol=4, byrow=TRUE)
trans4 = matrix(c(1 - 0.25 - mycov4, mycov4, 0.25 * 0.80, 0,
0, 1 - 0.50, 0, 0.50 * 0.75,
0, 0, 1, 0,
0, 0, 0, 1),
nrow=4, ncol=4, byrow=TRUE)
trans2b <- trans1 %*% trans2
trans3b <- trans2b %*% trans3
trans4b <- trans3b %*% trans4
trans4b
#
# This is the expected answer
#
# [,1] [,2] [,3] [,4]
# [1,] 0.01819965 0.1399834 0.3349504 0.3173467
# [2,] 0.00000000 0.0625000 0.0000000 0.7031250
# [3,] 0.00000000 0.0000000 1.0000000 0.0000000
# [4,] 0.00000000 0.0000000 0.0000000 1.0000000
#
Here is my C code. The C code is fairly long because I do not know C well enough to be efficient:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
char quit;
int main(){
int i, j, k, ii, jj, kk ;
double B0, B1, mycov ;
double trans[4][4] = {0} ;
double prevtrans[4][4] = {{1,0,0,0},
{0,1,0,0},
{0,0,1,0},
{0,0,0,1}};
B0 = -0.40 ;
B1 = 0.20 ;
for (i=1; i <= 4; i++) {
mycov = exp(B0 + B1 * (-2+i-1)) / (1 + exp(B0 + B1 * (-2+i-1))) ;
trans[0][0] = 1 - 0.25 - mycov ;
trans[0][1] = mycov ;
trans[0][2] = 0.25 * 0.80 ;
trans[0][3] = 0 ;
trans[1][0] = 0 ;
trans[1][1] = 1 - 0.50 ;
trans[1][2] = 0 ;
trans[1][3] = 0.50 * 0.75 ;
trans[2][0] = 0 ;
trans[2][1] = 0 ;
trans[2][2] = 1 ;
trans[2][3] = 0 ;
trans[3][0] = 0 ;
trans[3][1] = 0 ;
trans[3][2] = 0 ;
trans[3][3] = 1 ;
for (ii=0; ii<4; ii++){
for(jj=0; jj<4; jj++){
for(kk=0; kk<4; kk++){
trans[ii][jj] += trans[ii][kk] * prevtrans[kk][jj] ;
}
}
}
prevtrans[0][0] = trans[0][0] ;
prevtrans[0][1] = trans[0][1] ;
prevtrans[0][2] = trans[0][2] ;
prevtrans[0][3] = trans[0][3] ;
prevtrans[1][0] = trans[1][0] ;
prevtrans[1][1] = trans[1][1] ;
prevtrans[1][2] = trans[1][2] ;
prevtrans[1][3] = trans[1][3] ;
prevtrans[2][0] = trans[2][0] ;
prevtrans[2][1] = trans[2][1] ;
prevtrans[2][2] = trans[2][2] ;
prevtrans[2][3] = trans[2][3] ;
prevtrans[3][0] = trans[3][0] ;
prevtrans[3][1] = trans[3][1] ;
prevtrans[3][2] = trans[3][2] ;
prevtrans[3][3] = trans[3][3] ;
}
printf("To close this program type 'quit' and hit the return key\n");
printf(" \n");
scanf("%d", &quit);
return 0;
}
Here is the final matrix returned by the above C code:
0.4821 3.5870 11.68 381.22
0 1 0 76.875
0 0 5 0
0 0 0 5

This line
trans[ii][jj] += trans[ii][kk] * prevtrans[kk][jj] ;
is not right. You're modifying trans in place while you are still using it to compute the resultant matrix. You need another matrix to store the result of the multiplication temporarily. And then use:
// Store the resultant matrix in temp.
for (ii=0; ii<4; ii++){
for(jj=0; jj<4; jj++){
temp[ii][jj] = 0.0;
for(kk=0; kk<4; kk++){
temp[ii][jj] += trans[ii][kk] * prevtrans[kk][jj] ;
}
}
}
// Transfer the data from temp to trans
for (ii=0; ii<4; ii++){
for(jj=0; jj<4; jj++){
trans[ii][jj] = temp[ii][jj];
}
}

Related

How do you invert euclidean (transform and rotation only) matrices in C?

How do you invert 4x3 matrices that are only translation and rotation, no scale? The sort of thing you would use to do an OpenGL Matrix inverse (just without scaling)?
Assuming your TypeMatrix3x4 is a [3][4] matrix, and you are only transforming a 1:1 scale, rotation and translation matrix, the following code seems to work -
This transposes the rotation matrix and applies the inverse of the translation.
TypeMatrix3x4 InvertHmdMatrix34( TypeMatrix3x4 mtoinv )
{
int i, j;
TypeMatrix3x4 out = { 0 };
for( i = 0; i < 3; i++ )
for( j = 0; j < 3; j++ )
out.m[j][i] = mtoinv.m[i][j];
for ( i = 0; i < 3; i++ )
{
out.m[i][3] = 0;
for( j = 0; j < 3; j++ )
out.m[i][3] += out.m[i][j] * -mtoinv.m[j][3];
}
return out;
}
You can solve that for any 3 dimensional affine transformation whose 3x3 transformation matrix is invertible. This allows you to include scaling and non conformant applications. The only requirement is for the 3x3 matrix to be invertible.
Simply extend your 3x4 matrix to 4x4 by adding a row all zeros except the last element, and invert that matrix. For example, as shown below:
[[a b c d] [[x] [[x']
[e f g h] * [y] = [y']
[i j k l] [z] [z']
[0 0 0 1]] [1]] [1 ]] (added row)
It's easy to see that this 4x4 matrix, applied to your vector produces exactly the same vector as before the extension.
If you get the inverse of that matrix, you'll have:
[[A B C D] [[x'] [[x]
[E F G H] * [y'] = [y]
[I J K L] [z'] [z]
[0 0 0 1]] [1 ] [1]]
It's easy to see that it this works in one direction, it needs to be in the reverse direction, if A is the image of B, then B will be the inverse throug the inverse transformation, the only requisite is the matrix to be invertible.
More on... if you have a list of vectors you want to process, you can apply Gauss elimination method to an extended matrix of the form:
[[a b c d x0' x1' x2' ... xn']
[e f g h y0' y1' y2' ... yn']
[i j k l z0' z1' z2' ... zn']
[0 0 0 1 1 1 1 ... 1 ]]
to obtain the inverses of all the vectors you do the Gauss elimination vector to get from above:
[[1 0 0 0 x0 x1 x2 ... xn ]
[0 1 0 0 y0 y1 y2 ... yn ]
[0 0 1 0 z0 z1 z2 ... zn ]
[0 0 0 1 1 1 1 ... 1 ]]
and you will solve n problems in one shot, because the column vectors above will be the ones, that once transformed produce the former ones.
You can get a simple implementation I wrote to teach my son about linear algebra of Gauss/Jordan elimination method here. It's opensource (BSD license) and you can modify/adapt it to your needs. This method uses the last approach, and you can use it out of the box by trying the sist_lin program.
If you want the inverse transformation, put the following contents in the matrix, and apply Gauss elimination to:
a b c d 1 0 0 0
e f g h 0 1 0 0
i j k l 0 0 1 0
0 0 0 1 0 0 0 1
as input to sist_lin and you get:
1 0 0 0 A B C D <-- these are the coefs of the
0 1 0 0 E F G H inverse transformation
0 0 1 0 I J K L
0 0 0 1 0 0 0 1
you will have:
a * x + b * y + c * z + d = X
e * x + f * y + g * z + h = Y
i * x + j * y + k * z + l = Z
0 * x + 0 * y + 0 * z + 1 = 1
and
A * X + B * Y + C * Z + D = x
E * X + F * Y + G * Z + H = y
I * X + J * Y + K * Z + L = z
0 * X + 0 * Y + 0 * Z + 1 = 1

Defining variables explicitly vs accessing arrays

I am implementing the Runge-Kutta-Fehlberg method with adaptive step-size (RK45). I define and call my Butcher tableau in a notebook with
module FehlbergTableau
using StaticArrays
export A, B, CH, CT
A = #SVector [ 0 , 2/9 , 1/3 , 3/4 , 1 , 5/6 ]
B = #SMatrix [ 0 0 0 0 0
2/9 0 0 0 0
1/12 1/4 0 0 0
69/128 -243/128 135/64 0 0
-17/12 27/4 -27/5 16/15 0
65/432 -5/16 13/16 4/27 5/144 ]
CH = #SVector [ 47/450 , 0 , 12/25 , 32/225 , 1/30 , 6/25 ]
CT = #SVector [ -1/150 , 0 , 3/100 , -16/75 , -1/20 , 6/25 ]
end
using .FehlbergTableau
If I code the algorithm for RK45 straightforwardly as
function infinitesimal_flow(A::SVector{6,Float64}, B::SMatrix{6,5,Float64}, CH::SVector{6,Float64}, CT::SVector{6,Float64}, t0::Float64, Δt::Float64, J∇H::Function, x0::SVector{N,Float64}) where N
k1 = Δt * J∇H( t0 + Δt*A[1], x0 )
k2 = Δt * J∇H( t0 + Δt*A[2], x0 + B[2,1]*k1 )
k3 = Δt * J∇H( t0 + Δt*A[3], x0 + B[3,1]*k1 + B[3,2]*k2 )
k4 = Δt * J∇H( t0 + Δt*A[4], x0 + B[4,1]*k1 + B[4,2]*k2 + B[4,3]*k3 )
k5 = Δt * J∇H( t0 + Δt*A[5], x0 + B[5,1]*k1 + B[5,2]*k2 + B[5,3]*k3 + B[5,4]*k4 )
k6 = Δt * J∇H( t0 + Δt*A[6], x0 + B[6,1]*k1 + B[6,2]*k2 + B[6,3]*k3 + B[6,4]*k4 + B[6,5]*k5 )
TE = CT[1]*k1 + CT[2]*k2 + CT[3]*k3 + CT[4]*k4 + CT[5]*k5 + CT[6]*k6
xt = x0 + CH[1]*k1 + CH[2]*k2 + CH[3]*k3 + CH[4]*k4 + CH[5]*k5 + CH[6]*k6
norm(TE), xt
end
and compare it with the more compact implementation
function infinitesimal_flow_2(A::SVector{6,Float64}, B::SMatrix{6,5,Float64}, CH::SVector{6,Float64}, CT::SVector{6,Float64}, t0::Float64,Δt::Float64,J∇H::Function, x0::SVector{N,Float64}) where N
k = MMatrix{N,6}(0.0I)
TE = zero(x0); xt = x0
for i=1:6
# EDIT: this is wrong! there should be a new variable here, as pointed
# out by Lutz Lehmann: xs = x0
for j=1:i-1
# xs += B[i,j] * k[:,j]
x0 += B[i,j] * k[:,j] #wrong
end
k[:,i] = Δt * J∇H(t0 + Δt*A[i], x0)
TE += CT[i]*k[:,i]
xt += CH[i]*k[:,i]B[i,j] * k[:,j]
end
norm(TE), xt
end
Then the first function, which defines variables explicitly, is much faster:
J∇H(t::Float64, X::SVector{N,Float64}) where N = #SVector [ -X[2]^2, X[1] ]
x0 = SVector{2}([0.0, 1.0])
infinitesimal_flow(A, B, CH, CT, 0.0, 1e-2, J∇H, x0)
infinitesimal_flow_2(A, B, CH, CT, 0.0, 1e-2, J∇H, x0)
#btime infinitesimal_flow($A, $B, $CH, $CT, 0.0, 1e-2, $J∇H, $x0)
>> 19.387 ns (0 allocations: 0 bytes)
#btime infinitesimal_flow_2($A, $B, $CH, $CT, 0.0, 1e-2, $J∇H, $x0)
>> 50.985 ns (0 allocations: 0 bytes)
I cannot find a type instability or anything to justify the lag, and for more complex tableaus it is mandatory that I use the algorithm in loop form. What am I doing wrong?
P.S.: The bottleneck in infinitesimal_flow_2 is the line k[:,i] = Δt * J∇H(t0 + Δt*A[i], x0).
Each stage of the RK method computes its evaluation point directly from the base point of the RK step. This is explicit in the first method. In the second method you would have to reset the point computation in each stage, such as in
for i=1:6
xs = x0
for j=1:i-1
xs += B[i,j] * k[:,j]
end
k[:,i] = Δt * J∇H(t0 + Δt*A[i], xs)
...
The slightest error in the step computation can catastrophically throw off the step-size controller, forcing the step size to fall towards zero and thus the effort to increase drastically. An example is the 4101 error in RKF45

How to convert variables to an array in a pattern efficiently

Apologies for the clumsy wording, I am struggling on how to describe this problem.
My goal is to write a function that takes in three variables and outputs a 2D array with this pattern:
var foo = function(x, y, z) {
array = [
[x + 8, y + 16, z + 35],
[x + 6, y + 8, z + 30],
[x + 4, y + 4, z + 20],
[x + 2, y + 2, z + 10],
[x , y , z ],
[x - 2, y + 2, z - 10],
[x - 4, y + 4, z - 20],
[x - 6, y + 8, z - 30],
[x - 8, y + 16, z - 35]
]
return array;
}
Obviously, this way of writing the function seems pretty inefficient.
One way I tried to solve this is with a loop. But my solution introduces three arrays and is also pretty inelegant.
var x_mod = [8, 6, 4, 2, 0, -2, -4, -6, -8];
var y_mod = [16, 8, 4, 2, 0, 2, 4, 8, 16];
var z_mod = [35, 30, 20, 10, 0, -10, -20, -30, -35];
for(let i = 0; i < 9; i++) {
array[i] = [x + x_mod[i], y + y_mod[i], z + z_mod[i]);
}
Is there a better way of writing this algorithm? I would also appreciate any clues as to what this kind of problem is called, or what I should study to solve it.
Thank you!
EDIT
This is an example of the kind of optimization I was thinking of.
The following function
var bar = function(x, y, z) {
array = [
[x + 1, y + 2, z + 3],
[x + 2, y + 4, z + 6],
[x + 3, y + 6, z + 9]
]
return array;
}
could also be written in the following way:
var bar = function(x, y, z) {
array = [];
for(var i = 1; i < 4; i++)
array[i] = [x + i, x + i*2, x + i*3];
return array;
}
This is the kind of "optimization" that I wanted to apply to my original problem. Again, I apologize that I lack the vocabulary to adequately describe this problem.
Is this what you are looking for (in c# code).
static class Program
{
static void Main(string[] args)
{
var m_2 = GenerateMatrix(2, 0.0, 0.0, 0.0);
// result:
// | 2.0 2.0 10.0 | + span = 2
// | 0.0 0.0 0.0 | +
// | -2.0 -2.0 -10.0 |
var m_3 = GenerateMatrix(3, 0.0, 0.0, 0.0);
// result:
// | 4.0 4.0 20.0 | +
// | 2.0 2.0 10.0 | | span = 3
// | 0.0 0.0 0.0 | +
// | -2.0 -2.0 -10.0 |
// | -4.0 -4.0 -20.0 |
var m_5 = GenerateMatrix(5, 0.0, 0.0, 0.0);
// result:
// | 8.0 16.0 40.0 | +
// | 6.0 8.0 30.0 | |
// | 4.0 4.0 20.0 | | span = 5
// | 2.0 2.0 10.0 | |
// | 0.0 0.0 0.0 | +
// | -2.0 -2.0 -10.0 |
// | -4.0 -4.0 -20.0 |
// | -6.0 -8.0 -30.0 |
// | -8.0 -16.0 -40.0 |
}
static double[][] GenerateMatrix(int span, double x, double y, double z)
{
var result = new double[2*(span-1)+1][];
result[span-1] = new double[] { x, y, z };
for (int i = 0; i < span-1; i++)
{
result[span-2-i] = new double[] { x+2*(i+1), y + (2<<i), z + 10*(i+1) };
result[span+i] = new double[] { x-2*(i+1), y - (2<<i), z - 10*(i+1) };
}
return result;
}
I am using the following rules (use counter=1..span-1). Set the rows symmetrically from the middle since they follow the same pattern with only + or - as a difference:
x values are multiples of twos, x+2*counter and x-2*counter
y values are power of twos, pow(2,counter)=2<<counter
z values are multiples of tens, x+10*counter and x-10*counter
While I think that your first definition is the best, formulas might be defined:
diff = (4 - i)
ad = abs(diff)
x + diff * 2
y + (1 << abs(ad)) - trunc((4 - ad) / 4)
//using bit shift to compose power of two if possible
z + 10 * diff - 5 * trunc(diff / 4)
//rounding towards zero!
Python check:
import math
for i in range(0, 9):
diff = (4 - i)
ad = abs(diff)
print(i, diff * 2, (1 << abs(ad)) - (4 - ad) // 4, 10 * diff - 5 * math.trunc(diff / 4))
0 8 16 35
1 6 8 30
2 4 4 20
3 2 2 10
4 0 0 0
5 -2 2 -10
6 -4 4 -20
7 -6 8 -30
8 -8 16 -35
you can use recursive approach for your solution:
var your_array = []
function myFun(x, y, z, count){
//base case
if(count = 4)
return;
// head recursion
temp = [];
temp.push(x); temp.push(y); temp.push(z);
your_array.push(temp);
myFun(x-2, y/2, z-10, count+1)
//tail recursion
temp = []
temp.push(x); temp.push(y); temp.push(z);
your_array.push(temp);
}

Karatsuba Algorithm: splitting strings

I am trying to implement the Karatsuba algorithm in C.
I work with char strings (which are digits in a certain base), and although I think I have understood most of the Karatsuba algorithm, I do not get where I should split the strings to multiply.
For example, where should I cut 123 * 123, and where should I cut 123 * 12?
I can't get to a solution that works with both these calculations.
I tried to cut it in half and flooring the result when the number if odd, but it did not work, and ceiling does not work too.
Any clue?
Let a, b, c, and d be the parts of the strings.
Let's try with 123 * 12
First try (a = 1, b = 23, c = 1, d = 2) (fail)
z0 = a * c = 1
z1 = b * d = 46
z2 = (a + b) * (c + d) - z0 - z1 = 24 * 3 - 1 - 46 = 72 - 1 - 46 = 25
z0_padded = 100
z2_padded = 250
z0_padded + z1 + z2_padded = 100 + 46 + 250 = 396 != 123 * 12
Second try (a = 12, b = 3, c = 12, d = 0) (fail)
z0 = 144
z1 = 0
z2 = 15 * 12 - z1 - z0 = 180 - 144 = 36
z0_padded = 14400
z2_padded = 360
z0_padded + z1 + z2_padded = 14760 != 1476
Third try (a = 12, b = 3, c = 0, d = 12) (success)
z0 = 0
z1 = 36
z2 = 15 * 12 - z0 - z1 = 144
z0_padded = 0
z2_padded = 1440
z0_padded + z1 + z2_padded = 1476 == 1476
Let's try with 123 * 123
First try (a = 1, b = 23, c = 1, d = 23) (fail)
z0 = 1
z1 = 23 * 23 = 529
z2 = 24 * 24 - z0 - z1 = 46
z0_padded = 100
z2_padded = 460
z0_padded + z1 + z2_padded = 561 != 15129
Second try (a = 12, b = 3, c = 12, d = 3) (success)
z0 = 12 * 12 = 144
z1 = 3 * 3 = 9
z2 = 15 * 15 - z0 - z1 = 72
z0_padded = 14400
z2_padded = 720
z0_padded + z1 + z2_padded = 15129 == 15129
Third try (a = 12, b = 3, c = 1, d = 23) (fail)
z0 = 12
z1 = 3 * 23 = 69
z2 = 15 * 24 - z0 - z1 = 279
z0_padded = 1200
z2_padded = 2799
z0_padded + z1 = z2_padded = 4068 != 15129
Here, I do not get where I messed this up. Note that my padding method adds n zeroes at the end of a number where n = m * 2 and m equals the size of the longest string divided by two.
EDIT
Now that I have understood that b and d must be of the same length, it works almost everytime, but there are still exceptions: for example 1234*12
a = 123
b = 4
c = 1
d = 2
z0 = 123
z1 = 8
z2 = 127 * 3 - 123 - 8 = 250
z0_padded = 1230000
z2_padded = 25000
z0_padded + z1 + z2_padded = 1255008 != 14808
Here, assuming I split the strings correctly, the problem is the padding, but I do not get how I should pad. I read on Wikipedia that I should pad depending on the size of the biggest string (see a few lines up), there should be another solution.
The Karatsuba algorithm is a nice way to perform multiplications.
If you want it to work, b and d must be of the same length.
Here are two possibilities to compute 123x12 :
a= 1;b=23;c=0;d=12;
a=12;b= 3;c=1;d= 2;
Let's explain how it works for the second case :
123=12×10+3
12= 1×10+2
123×12=(12×10+3)×(1×10+2)
123×12=12×1×100+ (12×2+3×1)×10+3×2
123×12=12×1×100+((12+3)×(1+2)-12×1-3×2)×10+3×2
Let's explain how it works for the first case :
123=1×100+23
12=0×100+12
123×12=(1×100+23)×(0×100+12)
123×12=1×0×10000+ (1×12+23×0)×100+23×12
123×12=1×0×10000+((1+23)×(0+12)-1×0-23×12)×100+23×12
It also works with 10^k, 2^k or n instead of 10 or 100.

Matlab - arranging numbers

I have vectors m, x, y & I want m1, x1, y1 as commented below:
% given
m = [-4 -3 -2 2 3 4];
x = [2 5 6 7 9 1];
y = [10 23 34 54 27 32];
% required
% m1 = [2 3 4]; % only +ve value from m
% x1 = [13 14 3]; % adding numbers(in x) corres. to -ve & +ve value in m & putting below 2, 3, 4 respectively
% y1 = [88 50 42]; % adding numbers(in y) corres. to -ve & +ve value in m & putting below 2, 3, 4 respectively
m1 = m(m > 0) % this gives me m1 as required
Any hint for x1, y1 will be very helpful.
Assuming m is built as [vectorNegativeReversed, vectorPositiveOriginal] the solution can be quite straightforward:
p = numel(m)/2;
m1 = m(p+1:end)
x1 = x(p+1:end) + x(p:-1:1)
y1 = y(p+1:end) + y(p:-1:1)
What about some flippy action:
m = [-4 -3 -2 2 3 4];
x = [2 5 6 7 9 1];
y = [10 23 34 54 27 32];
idx = find( (m > 0) );
xdi = find( ~(m > 0) );
m1 = m(idx)
x1 = fliplr( x(xdi) ) + x(idx)
y1 = fliplr( y(xdi) ) + y(idx)
returning:
m1 =
2 3 4
x1 =
13 14 3
y1 =
88 50 42

Resources