Indexing slice from 3D Rcpp NumericVector - arrays

Hi I have what I think must be a really simple Rcpp question regarding treating NumericVector objects as multidimensional arrays. I can't find an answer to what might be obvious. Apologies up front if this is the case -- my inexperience with C++ is to blame...
If I use the answer posted here a (Constructing 3D array in Rcpp) as an example
library("Rcpp")
cppFunction(code='
NumericVector arrayC(NumericVector input, IntegerVector dim) {
input.attr("dim") = dim;
return input;
}
')
How do I extract/access an single slice / row / column out of the "intput" object?
I.e. Do something like
NumericMatrix X = input(_,_,i)
// FYI -- I know this doesn't work! Simply trying to convey the point...
And yes I know RcppArmadillo could be used. I have my reasons, for doing things this way but no need to bore folks with them.
Thanks.

Rcpp11 has Array for this, templated with both the dimension depth and the R type.
For example, you could do:
#include <Rcpp.h>
using namespace Rcpp ;
typedef Array<3,REALSXP> Numeric3D ;
// [[Rcpp::export]]
Numeric3D test(){
Numeric3D res(2,3,4) ;
for( int i=0; i<2; i++)
for( int j=0; j<3; j++)
for( int k=0; k<4; k++)
res(i,j,k) = i+j+k ;
return res ;
}
/*** R
test()
*/
All the relevant indexing logic is in the Index class template. The implementation uses C++11 variadic templates.

Everything I wrote in the previous answer you cite still holds: doable, but possibly painful as you may need to write converters. Contributions would still be welcome.
For what it is worth, I use the (Rcpp)Armadillo containers for three-dimensional data as they do have the slicing operators. Note that you can't easily convert them to something R likes ,ie I think we still automated converters for cube to lists of matrices.
Edit: For what it is worth, here is a short loop from a recent GitHub project of mine:
for (unsigned int j=k-1-1; j>0; j--) {
arma::mat Ppred = AA.slice(j) * P.slice(j) * AA.slice(j).t() + QQ.slice(j);
arma::mat lhs = (P.slice(j) * AA.slice(j).t());
arma::mat rhs = Ppred;
D.slice(j) = arma::solve(rhs.t(), lhs.t()).t();
M.col(j) = M.col(j) + D.slice(j) * (M.col(j+1) - AA.slice(j) * M.col(j));
P.slice(j) = P.slice(j) + D.slice(j) *
(P.slice(j+1) - Ppred) * D.slice(j).t();
}
This uses Armadillo slicing on both the left and right-hand sides. And this works rather well from R thanks to RcppArmadillo (modulo the aforementioned issue that because a R has no real native 3-d structure, so we can't pass a 3-d matrix back easily).

Related

Ruby C API - From ruby array to C array

I am passing an array (matrix) from Ruby to a C function. At the moment I am using the following code
VALUE matmat_mul(VALUE self, VALUE matrixA, VALUE matrixB)
{
int rowsA = RARRAY_LEN(matrixA);
VALUE firstElement = rb_ary_entry(matrixA, 0);
int colsA = RARRAY_LEN(firstElement);
int rowsB = RARRAY_LEN(matrixB);
firstElement = rb_ary_entry(matrixB, 0);
int colsB = RARRAY_LEN(firstElement);
int i,j;
double *matA = (double *)malloc(rowsA * colsA * sizeof(double));
double *matB = (double *)malloc(rowsB * colsB * sizeof(double));
VALUE rowA;
for (i=0; i<rowsA; i++)
{
rowA = rb_ary_entry(matrixA, i);
for (j=0; j<colsA; j++)
{
matA[i * colsA + j] = NUM2DBL(rb_ary_entry( rowA, j));
}
}
// same for matrix B
....
....
// Perform operation C = A x B
VALUE matrixC = rb_ary_new2(rowsC);
VALUE rowC;
for (i=0; i<rowsC; i++) {
rowC = rb_ary_new2(colsC);
for (j=0; j<colsC; j++) {
rb_ary_store(rowC, j, DBL2NUM(matC[i * colsC + j]));
}
rb_ary_store(matrixC, i, rowC);
}
return matrixC
}
Is there a better/quicker way to convert a Ruby array to a C array and viceversa?
No there is not a quicker way to convert Ruby Array to a C structure. That's because the Ruby Array could contain a mixture of any other kind of Ruby object, many of which could not be converted to a C double
There is another option though - NArray. This is a very efficient way of dealing with numerical multi-dimensional arrays in Ruby. There is a lot less procedure converting from an NArray to C, but it is entirely different way of doing things.
Some of it is a little complex. In summary . . .
Load the narray.h library in extconf.rb
Original version of this was from fftw3 gem (I have simplified a little):
require "mkmf"
require "narray"
narray_dir = File.dirname(Gem.find_files("narray.h").first) rescue $sitearchdir
dir_config('narray', narray_dir, narray_dir)
if ( ! ( have_header("narray.h") && have_header("narray_config.h") ) )
puts "Header narray.h or narray_config.h is not found."
exit(-1)
end
create_makefile( 'my_lib_name/my_lib_name' )
Cast input NArray objects to the data type you want to work with
Here's an example instance method that can access the NArray
VALUE example_narray_param( VALUE self, VALUE rv_narray ) {
// Cast the input to the data type you want - here 32-bit ints
volatile VALUE new_narray = na_cast_object(rv_narray, NA_LINT);
// NARRAY is the C struct interface to NArray data
struct NARRAY *na_items;
// This macro is NArray's equivalent of NUM2DBL, pointing na_items
// at the data
GetNArray( new_narray, na_items );
// row now points natively to the data
int * row = (int*) na_items->ptr;
For multi-dimensional arrays like your matrix, NArray uses a single pointer with multiplier offsets, similar to your matA[i * colsA + j] - going into full detail on this would be too long, but hopefully this is enough of a start to help you decide if this is the right solution for you.
I actually use this approach a lot in some personal projects. They are MIT licensed, so feel free to look through them and copy or re-use anything. This neural network layer class might contain some useful reference code.

Best solution to represent Data[i,j] in c?

There is a pseudocode that I want to implement in C. But I am in doubt on how to implement a part of it. The psuedocode is:
for every pair of states qi, and qj, i<j, do
D[i,j] := 0
S[i,j] := notzero
end for
i and j, in qi and qj are subscripts.
how do I represent D[i,J] or S[i,j]. which data structure to use so that its simple and fast.
You can use something like
int length= 10;
int i =0, j= 0;
int res1[10][10] = {0, }; //index is based on "length" value
int res2[10][10] = {0, }; //index is based on "length" value
and then
for (i =0; i < length; i++)
{
for (j =0; j < length; j++)
{
res1[i][j] = 0;
res2[i][j] = 1;//notzero
}
}
Here D[i,j] and S[i,j] are represented by res1[10][10] and res2[10][10], respectively. These are called two-dimentional array.
I guess struct will be your friend here depending on what you actually want to work with.
Struct would be fine if, say, pair of states creates some kind of entity.
Otherwise You could use two-dimensional array.
After accept answer.
Depending on coding goals and platform, to get "simple and fast" using a pointer to pointer to a number may be faster then a 2-D array in C.
// 2-D array
double x[MAX_ROW][MAX_COL];
// Code computes the address in `x`, often involving a i*MAX_COL, if not in a loop.
// Slower when multiplication is expensive and random array access occurs.
x[i][j] = f();
// pointer to pointer of double
double **y = calloc(MAX_ROW, sizeof *y);
for (i=0; i<MAX_ROW; i++) y[i] = calloc(MAX_COL, sizeof *(y[i]));
// Code computes the address in `y` by a lookup of y[i]
y[i][j] = f();
Flexibility
The first data type is easy print(x), when the array size is fixed, but becomes challenging otherwise.
The 2nd data type is easy print(y, rows, columns), when the array size is variable and of course works well with fixed.
The 2nd data type also row swapping simply by swapping pointers.
So if code is using a fixed array size, use double x[MAX_ROW][MAX_COL], otherwise recommend double **y. YMMV

Vectorized nested indexing

I have a for-loop to do indexing:
for (int i=0; i<N; i++){
a[i] = b[c[i]]
}
c are the indices of interest and are int *, while b and a are float * and the manipulated values.
But, this takes a long time (and it can't take that long). I'd like to have some vectorizing version, most likely found in BLAS/LAPLACK/etc.
I'm looking for nested_indexing(float * output_vector, float * input_vector, int * input_indices).
I've tried looking through the docs, but have not found anything.
vDSP_vgathr does exactly this. It takes in two float *'s and one int *. It does the equivalent of for (i=0; i<N; i++) a[i] = b[c[i]].
The wording they used was
Uses elements of vector B as indices to copy selected elements of vector A to sequential locations in vector C
It could be sequential indexing too, perhaps. I've noticed that the hardest part about finding these obscure functions is finding the right words to use in your searches.

Maintain a sorted array that a separate, iterative function can keep accessing

I'm writing code for a decision tree in C. Right now it gives me the correct result (0% training error, low test error), but it takes a long time to run.
The problem lies in how often I run qsort. My basic algorithm is this:
for every feature
sort that feature column using qsort
remove duplicate feature values in that column
for every unique feature value
split
determine entropy given that split
save the best feature to split + split value
for every training_example
if training_example's value for best feature < best split value, store in Left[]
else store in Right[]
recursively call this function, using only the Left[] training examples
recursively call this function, using only the Right[] training examples
Because the last two lines are iterative calls, and because the tree can extend for dozens and dozens of branches, the number of calls to qsort is huge (especially for my dataset that has > 1000 features).
My idea to reduce the runtime is to create a 2d array (in a separate function) where each column is a sorted feature column. Then, as long as I maintain a vector of row numbers of the training examples in Left[] and Right[] for each recursive call, I can just call this separate function, grab the rows I want in the pre-sorted feature vector, and save the cost of having to qsort each time.
I'm fairly new to C and so I'm not sure how to code this. In MatLab I can just have a global array that any function can change or access, looking for something like that in C.
Global arrays in C are totally possible. There are actually two ways of doing that. In the first case the dimensions of the array are fixed for the application:
#define NROWS 100
#define NCOLS 100
int array[NROWS][NCOLS];
int main(void)
{
int i, j;
for (i = 0; i < NROWS; i++)
for (j = 0; j < NCOLS; j++)
{
array[i][j] = i+j;
}
return 0;
}
In the second example the dimensions may depend on values from the input.
#include <stdlib.h>
int **array;
int main(void)
{
int nrows = 100;
int ncols = 100;
int i, j;
array = malloc(nrows*sizeof(*array));
for (i = 0; i < nrows; i++)
{
array[i] = malloc(ncols*sizeof(*(array[i])));
for (j = 0; j < ncols; j++)
{
array[i][j] = i+j;
}
}
}
Although the access to the arrays in both examples looks deceivingly similar, the implementation of the arrays is quite different. In the first example the array is located in one piece of memory and the strides to access rows is a whole row. In the second example each row access is a pointer to a row, which is one piece of memory. The various rows can however be located in different areas of the memory. In the second example rows might also have a different length. In that case you would need to store the length of each row somewhere too.
I don't fully understand what you are trying to achieve, because I'm not familiar with the terminology of decision tree, feature and the standard approaches to training sets. But you may also want to have a look at other data structures to maintain sorted data:
http://en.wikipedia.org/wiki/Red–black_tree maintains a more or less balanced and sorted tree.
AVL tree a bit slower but more balanced and sorted tree.
Trie a sorted tree on lists of elements.
Hash function to easily map a complex element to an integral value that can be used to sort the elements. Good for finding exact elements, but there is no real order in the elements itself.
P.S1: Coming from Matlab you may want to consider a different language from C to move to. C++ has standard libraries to support above data structures. Java, Python come to mind or even Haskell if you are daring. Pointer handling in C can be quite tedious and error prone.
P.S2: I'm unable to include a - in a URL on StackOverflow. So the Red-black tree links is a bit off and can't be clicked. If someone can edit my post to fix it, then I would appreciate that.

Copying a subset of an array into another array / array slicing in C

In C, is there any built-in array slicing mechanism?
Like in Matlab for example,
A(1:4)
would produce =
1 1 1 1
How can I achieve this in C?
I tried looking, but the closest I could find is this: http://cboard.cprogramming.com/c-programming/95772-how-do-array-subsets.html
subsetArray = &bigArray[someIndex]
But this does not exactly return the sliced array, instead pointer to the first element of the sliced array...
Many thanks
Doing that in std C is not possible. You have to do it yourself.
If you have a string, you can use string.h library who takes care of that, but for integers there's no library that I know.
Besides that, after having what you have, the point from where you want to start your subset, is actually easy to implement.
Assuming you know the size of your 'main' array and that is an integer array, you can do this:
subset = malloc((arraySize-i)*sizeof(int)); //Where i is the place you want to start your subset.
for(j=i;j<arraySize;j++)
subset[j] = originalArray[j];
Hope this helps.
Thanks everyone for pointing out that there is no such built-in mechanism in C.
I tried using what #Afonso Tsukamoto suggested but I realized I needed a solution for multi-dimensional array. So I ended up writing my own function. I will put it in here in case anyone else is looking for similar answer:
void GetSlicedMultiArray4Col(int A[][4], int mrow, int mcol, int B[1][4], int sliced_mrow)
{
int row, col;
sliced_mrow = sliced_mrow - 1; //cause in C, index starts from 0
for(row=0; row < mrow; row++)
{
for (col=0; col < mcol; col++)
{
if (row==sliced_mrow) B[0][col]=A[row][col];
}
}
}
So A is my input (original array) and B is my output (the sliced array).
I call the function like this:
GetSlicedMultiArray4Col(A, A_rows, A_cols, B, target_row);
For example:
int A[][4] = {{1,2,3,4},{1,1,1,1},{3,3,3,3}};
int A_rows = 3;
int A_cols = 4;
int B[1][4]; //my subset
int target_row = 1;
GetSlicedMultiArray4Col(A, A_rows, A_cols, B, target_row);
This will produce a result (multidimensional array B[1][4]) that in Matlab is equal to the result of A(target_row,1:4).
I am new to C so please correct me if I'm wrong or if this code can be made better... thanks again :)
In C,as far as I know, array name is just regarded as a const pointer. So you never know the size of the subset. And also you can assign a arrary to a new address. So you can simply use a pointer instead. But you should manage the size of the subset yourself.

Resources