"no operator found" when asigning Sparse matrices results to sparse matrices - eigen3

I do have a function that implements a minimization algorithm. I didn't include all the vars, just the matrices to illustrate the types:
typedef Eigen::SparseMatrix<double> SpMat;
typedef Eigen::VectorXd Vec;
int lm_solver(void (*f_dz)(Vec* x_, int m, Vec* dz_, SpMat* W_),
void (*f_H)(Vec* x_, SpMat* jac_,int n_, int m_),
Vec* x, int nx, int mm, int nnz,
double tol=1e-9, int max_iter = 100){
SpMat A(mm, nx);
SpMat H1(mm, nx);
SpMat H2(mm, nx);
SpMat H(mm, nx);
SpMat W(mm, mm);
Vec rhs(nx);
Vec dz(nx);
Vec dx(nx);
Vec a(1);
Vec b(1);
double f, f_prev, lbmda, rho, nu, tau;
bool updateH, converged;
int iter_;
// reserve matrices memory
H.reserve(nnz);
W.reserve(mm);
while (!converged && iter_ < max_iter){
// get the system matrices
if (updateH){ // if the Jacobian computation is not locked...
f_dz(x, mm, &dz, &W); // Residual increment (z-h(x)) vector creation or update: fill dz and W
f_H(x, &H, nx, mm); // Jacobian matrix creation or update: fill H
// Start forming the auxiliary matrices of A
H1 = H.transpose() * W;
H2 = H1 * H;
}
// set the first value of lmbda
if (iter_ == 1)
lbmda = tau * H2.diagonal().maxCoeff();
// form the system matrix A = H^t·W·H + lambda·I
A = H2 + lbmda * Idn;
// form the right hand side: H^t·W·dz
rhs = H1 * dz;
// Solve the increment: dx = solve(A, rhs);
solver.compute(A);
dx = solver.solve(rhs);
// calculate the objective function: Least squares function
a = 0.5 * dz * W * dz; //vector x matrix x vector -> vector of 1 element
f = a.coeffRef(0);
// calculate the gain ratio
b = 0.5 * dx * (lbmda * dx - rhs); //vector x matrix x vector -> vector of 1 element
rho = (f_prev - f) / b.coeffRef(0);
}
return 0;
}
The process does the following:
Declare sparse matrices matrices (SpMat)
reserve matrices memory
Call external functions to fill H, dz and W
Do matrices multiplications and store the results into intermediate matrices
that are sparse too.
This function is the only function in a .h file that is compiled into a static library .lib
When I compile the static library alone, it compiles flawlessly.
However when I use the library project from another project, I get the following error:
error: C2679: binary '=' : no operator found which takes a right-hand operand of type 'const Eigen::CwiseBinaryOp' (or there is no acceptable conversion)
\eigen\src/Core/Matrix.h(206): could be 'Eigen::Matrix<_Scalar,_Rows,_Cols> &Eigen::Matrix<_Scalar,_Rows,_Cols>::operator =(const Eigen::Matrix<_Scalar,_Rows,_Cols> &)'
with
[
_Scalar=double,
_Rows=-1,
_Cols=1
]
d:\proyectos\proyectos_i+d\ingrid\eigen\eigen_3_3_3\eigen\src/Core/Matrix.h(281): or 'Eigen::Matrix<_Scalar,_Rows,_Cols> &Eigen::Matrix<_Scalar,_Rows,_Cols>::operator =(Eigen::Matrix<_Scalar,_Rows,_Cols> &&)'
with
[
_Scalar=double,
_Rows=-1,
_Cols=1
]
while trying to match the argument list '(Vec, const Eigen::CwiseBinaryOp)'
This error flags the lines:
H1 = H.transpose() * W;
H2 = H1 * H;
rhs = H1 * dz;
b = 0.5 * dx * (lbmda * dx - rhs);
a = 0.5 * dz * W * dz;
I understand from this that I cannot store the result of sparse matrices multiplications in a new sparse matrix. I don't know the solution to this.
(I'm using Eigen 3.3.3)

I don't see what lines exactly cause your error, but it looks rather like it is caused by calculating a and b. You can't multiply a col-vector by another col-vector without transposing it, e.g.
b = 0.5 * dx.transpose() * (lbmda * dx - rhs);
However, this is actually a dot product, so you should just write
double b = 0.5 * dx.dot(lbmda * dx - rhs);

The problem was that I wrote all the functions in the .h.
By putting the body of the function on the .cpp all went fine.
This dicotomy of .h and .cpp is what anoys me the most about c++.
Anyway, for future reference.

Related

Logic for formula calculation

I need to compute the following problem in code:
x0 = 2 and xi = (−1/2) * x(sub i - 1) * sqrt(x(sub i - 1))
find the result of (1/e^(x1 + x2 + x3 + ...)).
(Or as marked up text)
Write a function of an appropriate type that calculates and returns the result of:
e(x1-1 - x2-1 + x3-1 - x4-1 + ...), for n elements, defined as: x0 = 2 and xi = -½√|xi-1|
It has to be done in C but I am just trying to figure out the logistics of it.
What I have thought till now : x0 has to be a variable initialized with 2 along with x1. x2, x3... will be calculated in a recursive function n-1 times. I am not sure how the results should be stored, also a variable or maybe an array? Would an array be appropriate??
Thank you.
Would it not be simpler to do it iteratively like this? I'm not actually sure if this generates the correct answer but this seems to be what your formula would imply.
long double
compute(unsigned n)
{
long double x = 2.0L;
for (unsigned i = 0; i < n; ++i)
x = (-(1.0L/2.0L) * x) * sqrtl(fabsl(x));
return x;
}

Rotate a triangle around itself in c using three vertixes

I need to rotate a triangle (called ship) around itself.
Here is what I got so far, but it doesn't work. It keeps getting smaller and smaller until it disappears.
void RotatePoint(Point *P, float angle)
{
float theta = angle * (180/3.1415);
P->x = (P->x * cos(theta)) - (P->y * sin(theta));
P->y = (P->y * cos(theta)) + (P->x * sin(theta));
}
void RotateShip(Ship *ship)
{
Rotate(&ship->A, rotateAngle);
Rotate(&ship->B, rotateAngle);
Rotate(&ship->C, rotateAngle);
}
Point P is the Point I want to rotate, and Point C is the center of the triangle. I thought that if I rotate all three vertixes, the triangle will rotate.
In my case, I initialize this way:
void initShip(Ship *ship)
{
ship->center.x = (SCREEN_W)/2.0;
ship->center.y = (SCREEN_H)/2.0;
ship->A.x = 0;
ship->A.y = -5;
ship->B.x = 15;
ship->B.y = 25;
ship->C.x = -15;
ship->C.y = 25;
ship->color = al_map_rgb(255, 255, 255);
}
Ship A, B and C are the distance from the center of the triangle. I draw it adding the A, B and C to the center vertix.
A=-0.699857,-19.963261
A=-0.000857,-19.951065
A=-0.699001,-19.914387
A=-0.001712,-19.902250
A=-0.698147,-19.865631
A=-0.002565,-19.853554
Im pressing one key back and one key forth, making it rotate clockwise and anticlockwise. notice how A is shrinking.
I don't know what I do. I should be going back to 20.00 when it reaches the top. This way my triangle is shrinking.
I'm using cos(0.035) and sin(0.035), meaning 2 degrees.
The OP has a classic bug: using a temporary (or intermediate) value where the original/initial value should be used instead.
As a simplified example, consider a case where you have three variables, a, b, and c, and want to rotate their values one variable to the left:
a = b;
b = c;
c = a; /* Oops! Won't work! */
The last assignment is a problem, because a is no longer the original value! You cannot order the assignments in a way that would avoid this problem; the only thing that changes is which variable will suffer from the problem. To fix the problem, you need to use a new temporary variable to hold the original value:
t = a;
a = b;
b = c;
c = t;
In OP's case, the ship structure should not mix the current shape of the ship, and the true/unrotated shape of the ship, in the same variables. Even if you avoid the abovementioned problem, you'll still suffer from accumulated rounding errors; it might take hours of gameplay, but eventually your ship would end up looking different.
The solution is to describe the ship shape in separate variables, or using constants in the ship update function.)
Let's say we have a variable dir that specifies the direction in radians, rotated counterclockwise from up, 0 being up (towards negative y axis), π/2 (and -3π/2) left (towards negative x axis), π (and -π) down, 3π/2 (and -π/2) right, and so on. If deg is in degrees, dir = deg * 3.14159265358979323846 / 180.0. We can also use the atan2() function to find out dir: dir = atan2(-x, y).
When dir = 0, OP wants A = { 0, -5 }, B = { 15, 25 }, and C = { -15, 25 }. If we define Adir = 3.14159, Ar = 5, Bdir = -0.54042, Br = sqrt(15*15+25*25) = 29.15476, Cdir = 0.54042, and Cr = 29.15476, then the ship vertices are
A.x = center.x + Ar*sin(dir + Adir);
A.y = center.y + Ar*cos(dir + Adir);
B.x = center.x + Br*sin(dir + Bdir);
B.y = center.y + Br*cos(dir + Bdir);
C.x = center.x + Cr*sin(dir + Cdir);
C.y = center.y + Cr*cos(dir + Cdir);
If the OP wants to fix the ship shape in the rotateShip function, then
void rotateShip(Ship *s, double rotateAngle)
{
s->A.x = s->center.x + 5.00000 * sin(rotateAngle + 3.14159);
s->A.y = s->center.y + 5.00000 * cos(rotateAngle + 3.14159);
s->B.x = s->center.x + 29.15476 * sin(rotateAngle - 0.54042);
s->B.y = s->center.y + 29.15476 * cos(rotateAngle - 0.54042);
s->C.x = s->center.x + 29.15476 * sin(rotateAngle + 0.54042);
s->C.y = s->center.y + 29.15476 * cos(rotateAngle + 0.54042);
}
Personally, I'd define the ship shape using a variable number of vertices:
typedef struct {
double x;
double y;
} vec2d;
typedef struct {
vec2d center;
size_t vertices;
const vec2d *shape; /* Un-rotated ship vertices */
double direction; /* Ship direction, in radians */
vec2d *vertex; /* Rotated ship vertices */
} Ship;
const vec2d default_shape[] = {
{ 0.0, -5.0 },
{ -15.0, 25.0 },
{ 15.0, 25.0 },
};
void updateShip(Ship *ship)
{
const double c = cos(ship->direction);
const double s = sin(ship->direction);
size_t i;
for (i = 0; i < ship->vertices; i++) {
ship->vertex[i].x = ship->center.x + c*ship->shape[i].x - s*ship->shape[i].y;
ship->vertex[i].y = ship->center.y + s*ship->shape[i].x + c*ship->shape[i].y;
}
}
void initShip(Ship *ship, const size_t vertices, const vec2d *shape)
{
ship->center.x = 0.5 * SCREEN_W;
ship->center.y = 0.5 * SCREEN_H;
if (vertices > 2 && shape != NULL) {
ship->vertices = vertices;
ship->shape = shape;
} else {
ship->vertices = (sizeof default_shape) / (sizeof default_shape[0]);
ship->shape = default_shape;
}
ship->direction = 0;
ship->vertex = malloc(ship->vertices * sizeof ship->vertex[0]);
if (!ship->vertex) {
fprintf(stderr, "Out of memory.\n");
exit(EXIT_FAILURE);
}
updateShip(ship);
}
In updateShip, we use 2D rotation by ship->direction, to rotate the ship model speficied by the vertices in shape[], saving the rotated and translated coordinates to vertex[].
x_current = x_center + x_original * cos(direction) - y_original * sin(direction);
y_current = y_center + x_original * sin(direction) + y_original * cos(direction);
as defined in e.g. the Wikipedia article on rotation. Note that the original coordinates, x_original and y_original (or the values in the shape[] array in the Ship structure) are never modified.
This way you can let the player "upgrade" their ship by just changing the shape to point to a new ship shape, and vertices to reflect that number.
I can reproduce fast shrinking (while also rotating) with coordinates in int.
(It would have been so much easier based on an MCVE....).
With coordinates in float, it shrinks much slower, but it still shrinks.
I relate that to the fact that your implementation collects all math errors (which computers always make) in a very visible way.
In order to avoid shrinking altogether:
Do not manipulate the relative coordinates in order to rotate. Instead store relative coordinates as constants, together with the ships orientation as an angle in double.
Then rotate by increasing/reducing the angle (wrapping around, to stay within -Pi ... +Pi).
Then draw by always applying the changing angle to the constant relative coordinates.
(I can only show you in detail, if you provide a MCVE.)
This way, the collected errors will only result in a slight and slowly growing misorientation,
which most likely will not be noticed by the pilot - and then be corrected by the pilot.
"Hmm, the ship has not yet completed the 360 I wanted. Oh well, I will turn a little more."
On a side note, I do not trust the way you use angles as parameters to cos() and sin().
Or to put it differently, I think
theta = angle * (180/3.1415); -> theta = angle; for a U-turn via Pi.
theta = angle * (180/3.1415); -> theta = angle * (3.1415/180); for a U-turn via 180.
For your implementation you get a U-turn for the angle (Pi*3.1415/180), which I cannot see a reason for.
I also recommend to use appropriate constants from math.h (e.g. M_PI), instead of your own constant with 4 decimal places.

2D Perlin Noise looking odd

I'm not sure if my Perlin Noise generator is functioning properly, the noise it generates looks very different from the images I see online. Mine looks too homogeneous (these are three different images):
Whereas what I usually see is something like:
My code is basically:
/* Get the coord of the top-left gradient of the grid (y, x) falls in */
int j = floor(x);
int i = floor(y);
/* Get the distance (y, x) is from it */
double dx = x-j;
double dy = y-i;
/* Influence of (g)radient(i)(j) (starting at the top-left one) */
double g00 = dot(grad(hashes, hsize, grads, i, j), dy, dx);
double g01 = dot(grad(hashes, hsize, grads, i, j+1), dy, dx-1);
double g10 = dot(grad(hashes, hsize, grads, i+1, j), dy-1, dx);
double g11 = dot(grad(hashes, hsize, grads, i+1, j+1), dy-1, dx-1);
/* Interpolate the influences using the blending function */
/* Linear interpol the top 2 */
double lt = lerp(g00, g01, fade(dx));
/* Linear interpol the bottom 2 */
double lb = lerp(g10, g11, fade(dx));
/* Linear interpol lb lt, completing the bilienear interpol */
return lerp(lt, lb, fade(dy));
Complete code. It's based mainly on this tutorial. I'm using this script to draw the csv file.
I understand the basics, but after reading several "tutorials" that usually contradict each other and the "reference implementation" which is not very readable I have a few doubts. The (x, y) points being interpolated should be in what interval? As I understand it, it should be [0, GRID_SIZE-1] (e.g. [0, 255] if using a pre-computed table with 256 random values). However, my code only results in reasonably good looking images when (x, y) is mapped to [0, 1], and I see some implementations online that map it to [0, 255] no matter the grid size. I'm also unsure if I'm picking the gradients correctly from the table.
You normalize your pixel coordinates to the whole image. You should normalize it to the size of your simplex grid.
So instead of your code for the inner loop:
double x = j/(double)w;
double y = i/(double)h;
do:
double x = j / gridsize;
double y = i / gridsize;
where the grid size is an additional parameter, for example:
double gridsize = 32.0;
(It should probably be chosen to fit evenly into the image dimensions.)

Symmetric Matrix Inversion in C using CBLAS/LAPACK

I am writing an algorithm in C that requires Matrix and Vector multiplications. I have a matrix Q (W x W) which is created by multiplying the transpose of a vector J(1 x W) with itself and adding Identity matrix I, scaled using scalar a.
Q = [(J^T) * J + aI].
I then have to multiply the inverse of Q with vector G to get vector M.
M = (Q^(-1)) * G.
I am using cblas and clapack to develop my algorithm. When matrix Q is populated using random numbers (type float) and inverted using the routines sgetrf_ and sgetri_ , the calculated inverse is correct.
But when matrix Q is symmetrical, which is the case when you multiply (J^T) x J, the calculated inverse is wrong!!.
I am aware of the row-major (in C) and column-major (in FORTRAN) format of arrays while calling lapack routines from C, but for a symmetrical matrix this should not be a problem as A^T = A.
I have attached my C function code for matrix inversion below.
I am sure there is a better way to solve this. Can anyone help me with this?
A solution using cblas would be great...
Thanks.
void InverseMatrix_R(float *Matrix, int W)
{
int LDA = W;
int IPIV[W];
int ERR_INFO;
int LWORK = W * W;
float Workspace[LWORK];
// - Compute the LU factorization of a M by N matrix A
sgetrf_(&W, &W, Matrix, &LDA, IPIV, &ERR_INFO);
// - Generate inverse of the matrix given its LU decompsotion
sgetri_(&W, Matrix, &LDA, IPIV, Workspace, &LWORK, &ERR_INFO);
// - Display the Inverted matrix
PrintMatrix(Matrix, W, W);
}
void PrintMatrix(float* Matrix, int row, int colm)
{
int i,k;
for (i =0; i < row; i++)
{
for (k = 0; k < colm; k++)
{
printf("%g, ",Matrix[i*colm + k]);
}
printf("\n");
}
}
I don't know BLAS or LAPACK, so I have no idea what may cause this behaviour.
But, for matrices of the given form, calculating the inverse is quite easy. The important fact for this is
(J^T*J)^2 = (J^T*J)*(J^T*J) = J^T*(J*J^T)*J = <J|J> * (J^T*J)
where <u|v> denotes the inner product (if the components are real - the canonical bilinear form for complex components, but then you'd probably consider not the transpose but the conjugate transpose, and you'd be back at the inner product).
Generalising,
(J^T*J)^n = (<J|J>)^(n-1) * (J^T*J), for n >= 1.
Let us denote the symmetric square matrix (J^T*J) by S and the scalar <J|J> by q. Then, for general a != 0 of sufficiently large absolute value (|a| > q):
(a*I + S)^(-1) = 1/a * (I + a^(-1)*S)^(-1)
= 1/a * (I + ∑ (-1)^k * a^(-k) * S^k)
k>0
= 1/a * (I + (∑ (-1)^k * a^(-k) * q^(k-1)) * S)
k>0
= 1/a * (I - 1/(a+q)*S)
= 1/a*I - 1/(a*(a+q))*S
That formula holds (by analyticity) for all a except a = 0 and a = -q, as can be verified by calculating
(a*I + S) * (1/a*I - 1/(a*(a+q))*S) = I + 1/a*S - 1/(a+q)*S - 1/(a*(a+q))*S^2
= I + 1/a*S - 1/(a+q)*S - q/(a*(a+q))*S
= I + ((a+q) - a - q)/(a*(a+q))*S
= I
using S^2 = q*S.
That calculation is also much simpler and more efficient than first finding the LU decomposition.
You may want to try Armadillo, which is an easy to use C++ wrapper for LAPACK. It provides several inverse related functions:
inv(), general inverse, with an optional speedup for symmetric positive definite matrices
pinv(), pseudo-inverse
solve(), solve a system of linear equations (that can be over- or under-determined), without doing the actual inverse
Example for 3x3 matrix inversion, visit sgetri.f for more
//__CLPK_integer is typedef of int
//__CLPK_real is typedef of float
__CLPK_integer ipiv[3];
{
//Compute LU lower upper factorization of matrix
__CLPK_integer m=3;
__CLPK_integer n=3;
__CLPK_real *a=(float *)this->m1;
__CLPK_integer lda=3;
__CLPK_integer info;
sgetrf_(&m, &n, a, &lda, ipiv, &info);
}
{
//compute inverse of a matrix
__CLPK_integer n=3;
__CLPK_real *a=(float *)this->m1;
__CLPK_integer lda=3;
__CLPK_real work[3];
__CLPK_integer lwork=3;
__CLPK_integer info;
sgetri_(&n, a, &lda, ipiv, work, &lwork, &info);
}

Matrix Multiplication CUDA [duplicate]

This question already has an answer here:
CUDA Matrix Multiplication write to wrong memory location
(1 answer)
Closed 2 years ago.
I have been reading through several websites and even used NVIDA's code as a guide but I am still getting the wrong answer. The main will ask the user for size, and will display A and B then display the resulting matrix C. However say I run a 2x2 matrix for both A and B this is my sample output:
Matrix A
0.000000 8.000000
2.000000 2.000000
Matrix B
3.000000 1.000000
5.000000 7.000000
Matrix C (Results)
0.000000 9.000000
7.000000 4.000000
But that's incorrect. It should be:
40.000 56.000
16.000 16.000
I changed it from decimals to whole numbers so that it would be easier to check, and I found that it's incorrect. I do not understand why it would be incorrect, especially even though I took it right from their code sample.
#ifndef _MATRIXMUL_KERNEL_H_
#define _MATRIXMUL_KERNEL_H_
#include <stdio.h>
// Thread block size
#define BLOCK_SIZE 16
#define TILE_SIZE 16
// CUDA Kernel
__global__ void matrixMul( float* C, float* A, float* B, int wA, int wB)
{
// Block index
int bx = blockIdx.x;
int by = blockIdx.y;
// Thread index
int tx = threadIdx.x;
int ty = threadIdx.y;
// Index of the first sub-matrix of A processed
// by the block
int aBegin = wA * BLOCK_SIZE * by;
// Index of the last sub-matrix of A processed
// by the block
int aEnd = aBegin + wA - 1;
// Step size used to iterate through the
// sub-matrices of A
int aStep = BLOCK_SIZE;
// Index of the first sub-matrix of B processed
// by the block
int bBegin = BLOCK_SIZE * bx;
// Step size used to iterate through the
// sub-matrices of B
int bStep = BLOCK_SIZE * wB;
float Csub=0;
// Loop over all the sub-matrices of A and B
// required to compute the block sub-matrix
for (int a = aBegin, b = bBegin; a <= aEnd; a += aStep, b += bStep)
{
// Declaration of the shared memory array As
// used to store the sub-matrix of A
__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];
// Declaration of the shared memory array Bs
// used to store the sub-matrix of B
__shared__ float Bs[BLOCK_SIZE][BLOCK_SIZE];
// Load the matrices from global memory
// to shared memory; each thread loads
// one element of each matrix
As[ty][tx] = A[a + wA * ty + tx];
Bs[ty][tx] = B[b + wB * ty + tx];
// Synchronize to make sure the matrices
// are loaded
__syncthreads();
// Multiply the two matrices together;
// each thread computes one element
// of the block sub-matrix
for (int k = 0; k < BLOCK_SIZE; ++k)
Csub += As[ty][k] * Bs[k][tx];
// Synchronize to make sure that the preceding
// computation is done before loading two new
// sub-matrices of A and B in the next iteration
__syncthreads();
}
// Write the block sub-matrix to device memory;
// each thread writes one element
int c = wB * BLOCK_SIZE * by + BLOCK_SIZE * bx;
C[c + wB * ty + tx] = Csub;
}
#endif // #ifndef _MATRIXMUL_KERNEL_H_
host code:
//perform the calculation
//setup execution parameters
dim3 threads(BLOCK_SIZE, BLOCK_SIZE);
dim3 grid(c.colSize / threads.x, c.rowSize / threads.y);
// execute the kernel
matrixMul<<< grid, threads >>>(deviceMatrixC, deviceMatrixA, deviceMatrixB, a.colSize, b.colSize);
Thanks for your help,
Dan
The code you are using implicitly requires that the size of the matrices are round multiples of the block size (16x16 in this case). The inner product calculation processes a tile width at a time without checking for out of bounds memory access. For this reason, 2x2 matrices will not work.
If you try running kernel with a 16x16 input (for example zero padding your 2x2 case to 16x16), you should be able to confirm the result.

Resources