Converting an Armadillo sp_mat to an Eigen SparseMatrix<double> and vice versa - sparse-matrix

I have a Rcpp code written with Armadillo. I would like to use the Eigen library to do the least square estimation with sparse matrix A in the equation Ax=b.
Question is how can I convert from an Armadillo sp_mat to an Eigen SparseMatrix and vice versa?
There is a similar question on How can I convert from an Armadillo Matrix to an Eigen MatrixXd and vice versa?
question link: Converting an Armadillo Matrix to an Eigen MatriXd and vice versa
EDIT mockup file
in the file mockup.cpp
#include <RcppArmadillo.h>
#include <RcppEigen.h>
#include <iostream>
// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]
// the function logit is a demo in Rcpp
// [[Rcpp::export]]
List logit_(arma::sp_mat A, Eigen::SparseMatrix<double> B){
// arma documentation link: http://arma.sourceforge.net/docs.html#memptr
int numA = A.n_rows;
arma::sp_mat out_A;
out_A = A.t();
// eigen documentation link: http://eigen.tuxfamily.org/dox/group__QuickRefPage.html
int numB = B.rows();
Eigen::SparseMatrix<double> out_B;
out_B = B.transpose();
return List::create(Named("out_A") = out_A,Named("out_B") = out_B);
}

Related

How to use openBLAS to improve vectorized operations?

I am self-learning how to write efficient, optimized deep learning code; but I am very much a newbie at this.
For example: I am reading that numpy uses vectorization to avoid python loops.
They have also pretty much coined the term broadcasting according to that link, which is used by TensorFlow, PyTorch and others.
I did some digging, and found that ldd on my Debian box shows multiarray.so links libopenblasp-r0-39a31c03.2.18.so.
So let's take the use case of a matrix subtraction. I would like to understand how to use openBLAS to improve this very naive implementation:
void matrix_sub(Matrix *a, Matrix *b, Matrix *res)
{
assert(a->cols == b->cols);
assert(a->rows == b->rows);
zero_out_data(res, a->rows, a->cols);
for (int i = 0; i < (a->rows*a->cols); i++)
{
res->data[i] = a->data[i] - b->data[i];
}
}
Like wise an inner product, or an addition?

STM32F Discovery - Undefined reference to arm_sin_f32

I'm new to programming the STM32F Discovery board. I followed the instructions here and managed to get the blinky led light working.
But now I'm trying to play an audio tone for which I have borrowed code from here. In my Makefile I have included CFLAGS += -lm which is where I understand that arm_sin_f32 is defined.
This is the code for main.c:
#define USE_STDPERIPH_DRIVER
#include "stm32f4xx.h"
#define ARM_MATH_CM4
#include <arm_math.h>
#include <math.h>
#include "speaker.h"
//Quick hack, approximately 1ms delay
void ms_delay(int ms)
{
while (ms-- > 0) {
volatile int x=5971;
while (x-- > 0)
__asm("nop");
}
}
volatile uint32_t msTicks = 0;
// SysTick Handler (every time the interrupt occurs, this is called)
void SysTick_Handler(void){ msTicks++; }
// initialize the system tick
void InitSystick(void){
SystemCoreClockUpdate();
// division occurs in terms of seconds... divide by 1000 to get ms, for example
if (SysTick_Config(SystemCoreClock / 10000)) { while (1); } //
update every 0.0001 s, aka 10kHz
}
//Flash orange LED at about 1hz
int main(void)
{
SystemInit();
InitSystick();
init_speaker();
int16_t audio_sample;
int loudness = 250;
float audio_freq = 440;
audio_sample = (int16_t) (loudness * arm_sin_f32(audio_freq*msTicks/10000));
send_to_speaker(audio_sample);
}
But when trying to run make I get the following error:
main.c:42: undefined reference to `arm_sin_f32'
By using -lm, you're linking to libc's math library, which for floating points provides you with
https://www.gnu.org/software/libc/manual/html_node/Trig-Functions.html
Function: double sin (double x)
Function: float sinf (float x)
Function: long double sinl (long double x)
Function: _FloatN sinfN (_FloatN x)
Function: _FloatNx sinfNx (_FloatNx x)
Preliminary: | MT-Safe | AS-Safe | AC-Safe | See POSIX Safety Concepts.
These functions return the sine of x, where x is given in radians. The return value is in the range -1 to 1.
You'll want to use sinf as you're using a float.
If you'd like to use arm_sin_f32, then you should link to CMSIS's dsp library.
https://www.keil.com/pack/doc/CMSIS/DSP/html/group__sin.html
float32_t arm_sin_f32 (float32_t x)
Fast approximation to the trigonometric sine function for floating-point
data.
You should link to the appropriate precompiled library as detailed in: CMSIS DSP Software Library
The latest version of CMSIS at this moment is available at:
https://github.com/ARM-software/CMSIS_5
I don't think you should simply copy the c-files, as it will 'pollute' your own project and updating will be hard.
Simply download the latest release and to your makefile add:
CMSISPATH = "C:/path/to/cmsis/top/directory"
CFLAGS += -I$(CMSISPATH)/CMSIS/DSP/Include
LDFLAGS += -L$(CMSISPATH)/CMSIS/Lib/GCC/ -larm_cortexM4lf_math
First of all the arm_sin_32 does not exist. arm_sin_f32 for example yes. There are more different ones as well. You need to add the appropriate c file from the CMSIS to your project for example: CMSIS/DSP/Source/FastMathFunctions/arm_sin_f32.c
I would suggest to do not use the one from the keil as it probably outdated - just download the most current version of the CMSIS from github.
arm_.... functions are not the part of the m library.
Do not use nop-s for the delay as they are instantly flushed out from the pipeline without the execution. They are used only for padding

Enabling HVX SIMD in Hexagon DSP by using instruction intrinsics

I was using Hexagon-SDK 3.0 to compile my sample application for HVX DSP architecture. There are many tools related to Hexagon-LLVM available to use located folder at:
~/Qualcomm/HEXAGON_Tools/7.2.12/Tools/bin
I wrote a small example to calculate the product of two arrays to makes sure I can utilize the HVX hardware acceleration. However, when I generate my assembly, either with -S , or, with -S -emit-llvm I don't find any definition of HVX instructions such as vmem, vX, etc. My C application is executing on hexagon-sim for now till I manage to find a way to run in on the board as well.
As far as I understood, I need to define my HVX part of the code in C Intrinsics, but was not able to adapt the existing examples to match my own needs. It would be great if somebody could demonstrate how this process can be done. Also in the Hexagon V62 Programmer's Reference Manual many of the intrinsic instructions are not defined.
Here is my small app in pure C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#if defined(__hexagon__)
#include "hexagon_standalone.h"
#include "subsys.h"
#endif
#include "io.h"
#include "hvx.cfg.h"
#define KERNEL_SIZE 9
#define Q 8
#define PRECISION (1<<Q)
double vectors_dot_prod2(const double *x, const double *y, int n)
{
double res = 0.0;
int i = 0;
for (; i <= n-4; i+=4)
{
res += (x[i] * y[i] +
x[i+1] * y[i+1] +
x[i+2] * y[i+2] +
x[i+3] * y[i+3]);
}
for (; i < n; i++)
{
res += x[i] * y[i];
}
return res;
}
int main (int argc, char* argv[])
{
int n;
long long start_time, total_cycles;
/* -----------------------------------------------------*/
/* Allocate memory for input/output */
/* -----------------------------------------------------*/
//double *res = memalign(VLEN, 4 *sizeof(double));
const double *x = memalign(VLEN, n *sizeof(double));
const double *y = memalign(VLEN, n *sizeof(double));
if ( *x == NULL || *y == NULL ){
printf("Error: Could not allocate Memory for image\n");
return 1;
}
#if defined(__hexagon__)
subsys_enable();
SIM_ACQUIRE_HVX;
#if LOG2VLEN == 7
SIM_SET_HVX_DOUBLE_MODE;
#endif
#endif
/* -----------------------------------------------------*/
/* Call fuction */
/* -----------------------------------------------------*/
RESET_PMU();
start_time = READ_PCYCLES();
vectors_dot_prod2(x,y,n);
total_cycles = READ_PCYCLES() - start_time;
DUMP_PMU();
printf("Array product of x[i] * y[i] = %f\n",vectors_dot_prod2(x,y,4));
#if defined(__hexagon__)
printf("AppReported (HVX%db-mode): Array product of x[i] * y[i] =%f\n", VLEN, vectors_dot_prod2(x,y,4));
#endif
return 0;
}
I compile it using hexagon-clang:
hexagon-clang -v -O2 -mv60 -mhvx-double -DLOG2VLEN=7 -I../../common/include -I../include -DQDSP6SS_PUB_BASE=0xFE200000 -o arrayProd.o -c arrayProd.c
Then link it with subsys.o (is found in DSK and already compiled) and -lhexagon to generate my executable:
hexagon-clang -O2 -mv60 -o arrayProd.exe arrayProd.o subsys.o -lhexagon
Finally, run it using the sim:
hexagon-sim -mv60 arrayProd.exe
A bit late, but might still be useful.
Hexagon Vector eXtensions are not emitted automatically and current instruction set (as of 8.0 SDK) only supports integer manipulation, so compiler will not emit anything for the C code containing "double" type (it is similar to SSE programming, you have to manually pack xmm registers and use SSE intrinsics to do what you need).
You need to define what your application really requires.
E.g., if you are writing something 3D-related and really need to calculate double (or float) dot products, you might convert yout floats to 16.16 fixed point and then use instructions (i.e., C intrinsics) like
Q6_Vw_vmpyio_VwVh and Q6_Vw_vmpye_VwVuh to emulate fixed-point multiplication.
To "enable" HVX you should use HVX-related types defined in
#include <hexagon_types.h>
#include <hexagon_protos.h>
The instructions like 'vmem' and 'vmemu' are emitted automatically for statements like
// I assume 64-byte mode, no `-mhvx-double`. For 128-byte mode use 32 int array
int values[16] = { 1, 2, 3, ..... };
/* The following line compiles to
{
r4 = __address_of_values
v1 = vmem(r4 + #0)
}
You can get the exact code by using '-S' switch, as you already do
*/
HVX_Vector v = *(HVX_Vector*)values;
Your (fixed-point) version of dot_product may read out 16 integers at a time, multiply all 16 integers in a couple of instructions (see HVX62 programming manual, there is a tip to implement 32-bit integer multiplication from 16-bit one),
then shuffle/deal/ror data around and sum up rearranged vectors to get dot product (this way you may calculate 4 dot products almost at once and if you preload 4 HVX registers - that is 16 4D vectors - you may calculate 16 dot products in parallel).
If what you are doing is really just byte/int image processing, you might use specific 16-bit and 8-bit hardware dot products in Hexagon instruction set, instead of emulating doubles and floats.

Armadillo: eigs_gen for smallest eigenvalue

I'm using armadillo's eigs_gen to find the smallest algebraic eigenvalue of a sparse matrix.
If I request the function for just the smallest eigenvalue the result is incorrect but if I request it for the 2 smallest eigenvalues the result is correct. The code is:
#include <iostream>
#include <armadillo>
using namespace std;
using namespace arma;
int
main(int argc, char** argv)
{
cout << "Armadillo version: " << arma_version::as_string() << endl;
sp_mat A(5,5);
A(1,2) = -1;
A(2,1) = -1;
A(3,4) = -1;
A(4,3) = -1;
cx_vec eigval;
cx_mat eigvec;
eigs_gen(eigval, eigvec, A, 1, "sr"); // find smallest eigenvalue ---> INCORRECT RESULTS
eigval.print("Smallest real eigval:");
eigs_gen(eigval, eigvec, A, 2, "sr"); // find 2 smallest eigenvalues ---> ALMOST CORRECT RESULTS
eigval.print("Two smallest real eigvals:");
return 0;
}
My compile command is:
g++ file.cpp -o file.exe -O2 -I/path-to-armadillo/armadillo-4.600.3/include -DARMA_DONT_USE_WRAPPER -lblas -llapack -larpack
The output is:
Armadillo version: 4.600.3 (Off The Reservation)
Smallest real eigval:
(+1.000e+00,+0.000e+00)
Two smallest real eigvals:
(-1.000e+00,+0.000e+00)
(-1.164e-17,+0.000e+00)
Any idea on why this is happening and how to overcome this is appreciated.
Note: second result is only almost correct because we expect -1, -1 as the two lowest eigenvalues but perhaps repeated eigenvalues are ignored.
Update: including a test matrix construction which, after ryan's changes to include the "sa" option to the library, doesn't seem to converge:
#define ARMA_64BIT_WORD
#include <armadillo>
#include <iostream>
#include <vector>
#include <stdio.h>
using namespace arma;
using namespace std;
int main(){
size_t l(3), ls(l*l*l);
sp_mat A = sprandn<sp_mat>(ls, ls, 0.01);
sp_mat B = A.t()*A;
vec eigval;
mat eigvec;
eigs_sym(eigval, eigvec, B, 1, "sa");
return 0;
}
The matrix sizes of interest are much larger e.g. ls = 8000 - 27000, and is not quite the matrix constructed here but I presume the problem should be the same.
I believe that the issue here is that you are running eigs_gen() (which calls DNAUPD) on a symmetric matrix. ARPACK notes that DNAUPD is not meant for symmetric matrices, but does not specify what will happen if you use symmetric matrices anyway:
NOTE: If the linear operator "OP" is real and symmetric with respect to the real positive semi-definite symmetric matrix B, i.e. B*OP = (OP')*B, then subroutine ssaupd should be used instead.
(from http://www.mathkeisan.com/usersguide/man/dnaupd.html )
I modified the internal Armadillo code to pass "sa" (smallest algebraic) to the ARPACK calls in eigs_sym() (sp_auxlib_meat.hpp), and I was able to obtain the correct eigenvalues. I've submitted a patch upstream to make "sa" and "la" support available for eigs_sym(), which I think should solve your problem once a new version is released (or at some point in the future).
The problem is with repeated eigenvalues; if I change the first two matrix elements to
A(1,2) = -1.00000001;
A(2,1) = -1.00000001;
the expected results are obtained.

How to use vlfeat sift matching function in C code?

I just found one similar question here. But I just want to do a matching based on the description result from vlfeat. The goal to detect if an image contains the object in another image, based on sift feature description extracting and matching. And I need to do it in C, not Matlab.
So how can I call vl_ubcmatch function in C code?
So how can I call vl_ubcmatch function in C code?
This is a MEX function which is only intented to be called from MATLAB. You cannot re-use it as-is from a general purpose C program.
The goal to detect if an image contains the object in another image [...] How to do SIFT matching algorithm if I use vlfeat?
VLFeat C API does not provide SIFT matching functions out-of-the box. So basically you need to adapt the so-called ratio test [1] code section from this MATLAB C code section which is fairly easy (see below).
The main drawback if you want to perform robust matching is that this function does not take into account the geometry, i.e the keypoints coordinates.
What you need in addition is a geometrical consistency check which is typically performed by figuring out if there is an homography between the two images (using as input the descriptor correspondences obtained with the ratio test). This is done with an algorithm like RANSAC since the correspondences may include outliers.
But also you can speed up correspondences computation with a kd-tree.
So an alternative if you need a plain C implementation is relying on Open SIFT by Rob Hess which includes everything you need, as well as a ready-to-use command-line tool (and thus example) of matching:
See match.c.
typedef struct {
int k1;
int k2;
double score;
} Pair;
Pair *
compare(
Pair *pairs,
const float *descr1,
const float *descr2,
int K1,
int K2,
int ND,
float thresh
)
{
int k1, k2;
/* Loop over 1st image descr. */
for (k1 = 0; k1 < K1; ++k1, descr1 += ND ) {
float best = FLT_MAX;
float second_best = FLT_MAX;
int bestk = -1;
/* Loop over 2nd image descr. and find the 1st and 2nd closest descr. */
for (k2 = 0; k2 < K2; ++k2, descr2 += ND ) {
int bin;
float acc = 0;
/* Compute the square L2 distance between descriptors */
for (bin = 0 ; bin < ND ; ++bin) {
float delta = descr1[bin] - descr2[bin];
acc += delta*delta;
if (acc >= second_best)
break;
}
if (acc < best) {
second_best = best;
best = acc;
bestk = k2;
}
else if (acc < second_best) {
second_best = acc;
}
}
/* Rewind */
descr2 -= ND*K2;
/* Record the correspondence if the best descr. passes the ratio test */
if (thresh * best < second_best && bestk != -1) {
pairs->k1 = k1;
pairs->k2 = bestk;
pairs->score = best;
pairs++;
}
}
return pairs;
}
K1: number of descriptors in image 1,
K2: number of descriptors in image 2,
ND: descriptor dimension (= 128 for SIFT),
descr1 and descr2: descriptors of image 1 and 2 resp. in row major order, e.g K1 lines x ND columns),
thresh: ratio test threshold value, e.g 1.5 in MATLAB code.
[1] see 7.1 Keypoint Matching from D. Lowe's paper.
You will use the vlfeat library the same way you use any other library with C. First make sure you have the library installed on your computer and know where it is installed. You will need to include the required header for each part of vlfeat you are using. Generally a generic library header for vlfeat and then a specific header for sift (e.g. #include "sift.h")(sometimes there is no general header). You will need to insure gcc or g++ command includes the proper INCLUDE_PATH and LIBRARY_PATH for your environment that will allow gcc to find your vlfeat files. (e.g. -I/path/to/dir/holding_sift.h and -L/path/to/vlfeatlib) So you will end up with something like this for C:
gcc -o exename exename.c -I/path/to/dir/holding_sift.h -L/path/to/vlfeatlib -lvl
There is documentation on line that will help. See: how to setup a basic C++ project which uses the VLFeat library If you have further questions, just drop a line in the comments.

Resources