R matrix memory representation - c

I'm writing a plugin for R and I want to allocate a 3-dimensional R matrix to return. How can I do this? In Rinternals.h I see an allocMatrix and an alloc3DArray. Do I use one of those?
If it is too annoying, I can accept a matrix from the user, but I need to know what the internal representation is so that I can fill it in.
Thank you.

Two problems seem at issue. One is validating input from a user and the other is allocation. I would be surprised if it were very much faster to use the .Call interface or an rcpp strategy than just allocation with :
obj <- array(NA, dim=c(x,y,z)) # where the x,y and z values would be user input.
If you look at the code for array you see this as the likely workhorse function:
if (is.atomic(data) && !is.object(data))
return(.Internal(array(data, dim, dimnames)))

It's worth understanding that arrays in R are really just vectors with a dimension attribute set:
> x <- array(0, c(2, 2, 2))
> .Internal(inspect(x))
#7f859baf5ee8 14 REALSXP g0c4 [NAM(2),ATT] (len=8, tl=0) 0,0,0,0,0,...
ATTRIB:
#7f85a1d593c0 02 LISTSXP g0c0 []
TAG: #7f859c8043f8 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "dim" (has value)
#7f85a4040bc0 13 INTSXP g0c2 [NAM(2)] (len=3, tl=0) 2,2,2
So if you want to make a matrix, or array, 'by hand', it's as simple as allocating a vector of the correct length, and then setting the dimension attribute. Eg:
SEXP myArray = PROTECT(allocVector(REALSXP, m * n * k));
SEXP myDims = PROTECT(allocVector(INTSXP, 3));
setAttrib(myArray, R_DimSymbol, myDims);

Related

How to take irfft of multidimensional array in julia?

I am new to julia, and I am trying to take the irfft of B, which is a 3d array of size (n/2, n, n) where B = rfft(A). However, the irfft in julia reqires an additional input d for the size of the transformed real array, and I'm unsure of what to put. I tried n and n/2, but both did not seem to work as expected when I printed the resulting matrix out.
EDIT: I should've lowered my dimensions to check if everything was working, turns out using d = n is ok. Thanks to everyone who answered!
Check out this discussion. Presumably any triple of numbers will do the trick, but may or may not give you what you want.
This should work:
using FFTW
function test(n = 16)
a = rand(n ÷ 2, n, n)
f = rfft(a)
#show irfft(f, n ÷ 2 + 1)
end
test()

Seg faulting with 4D arrays & initializing dynamic arrays

I ran into a big of a problem with a tetris program I'm writing currently in C.
I am trying to use a 4D multi-dimensional array e.g.
uint8_t shape[7][4][4][4]
but I keep getting seg faults when I try that, I've read around and it seems to be that I'm using up all the stack memory with this kind of array (all I'm doing is filling the array with 0s and 1s to depict a shape so I'm not inputting a ridiculously high number or something).
Here is a version of it (on pastebin because as you can imagine its very ugly and long).
If I make the array smaller it seems to work but I'm trying to avoid a way around it as theoretically each "shape" represents a rotation as well.
https://pastebin.com/57JVMN20
I've read that you should use dynamic arrays so they end up on the heap but then I run into the issue how someone would initialize a dynamic array in such a way as linked above. It seems like it would be a headache as I would have to go through loops and specifically handle each shape?
I would also be grateful for anybody to let me pick their brain on dynamic arrays how best to go about them and if it's even worth doing normal arrays at all.
Even though I have not understood why do you use 4D arrays to store shapes for a tetris game, and I agree with bolov's comment that such an array should not overflow the stack (7*4*4*4*1 = 448 bytes), so you should probably check other code you wrote.
Now, to your question on how to manage 4D (N-Dimensional)dynamically sized arrays. You can do this in two ways:
The first way consists in creating an array of (N-1)-Dimensional arrays. If N = 2 (a table) you end up with a "linearized" version of the table (a normal array) which dimension is equal to R * C where R is the number of rows and C the number of columns. Inductively speaking, you can do the very same thing for N-Dimensional arrays without too much effort. This method has some drawbacks though:
You need to know beforehand all the dimensions except one (the "latest") and all the dimensions are fixed. Back to the N = 2 example: if you use this method on a table of C columns and R rows, you can change the number of rows by allocating C * sizeof(<your_array_type>) more bytes at the end of the preallocated space, but not the number of columns (not without rebuilding the entire linearized array). Moreover, different rows must have the same number of columns C (you cannot have a 2D array that looks like a triangle when drawn on paper, just to get things clear).
You need to carefully manage the indicies: you cannot simply write my_array[row][column], instead you must access that array with my_array[row*C + column]. If N is not 2, then this formula gets... interesting
You can use N-1 arrays of pointers. That's my favourite solution because it does not have any of the drawbacks from the previous solution, although you need to manage pointers to pointers to pointers to .... to pointers to a type (but that's what you do when you access my_array[7][4][4][4].
Solution 1
Let's say you want to build an N-Dimensional array in C using the first solution.
You know the length of each dimension of the array up to the (N-1)-th (let's call them d_1, d_2, ..., d_(N-1)). We can build this inductively:
We know how to build a dynamic 1-dimensional array
Supposing we know how to build a (N-1)-dimensional array, we show that we can build a N-Dimensional array by putting each (N-1)-dimensional array we have available in a 1-Dimensional array, thus increasing the available dimensions by 1.
Let's also assume that the data type that the arrays must hold is called T.
Let's suppose we want to create an array with R (N-1)-dimensional arrays inside it. For that we need to know the size of each (N-1)-dimensional array, so we need to calculate it.
For N = 1 the size is just sizeof(T)
For N = 2 the size is d_1 * sizeof(T)
For N = 3 the size is d_2 * d_1 * sizeof(T)
You can easily inductively prove that the number of bytes required to store R (N-1)-dimensional arrays is R*(d_1 * d_2 * ... * d_(n-1) * sizeof(T)). And that's done.
Now, we need to access a random element inside this massive N-dimensional array. Let's say we want to access the item with indicies (i_1, i_2, ..., i_N). For this we are going to repeat the inductive reasoning:
For N = 1, the index of the i_1 element is just my_array[i_1]
For N = 2, the index of the (i_1, i_2) element can be calculated by thinking that each d_1 elements, a new array begins, so the element is my_array[i_1 * d_1 + i_2].
For N = 3, we can repeat the same process and end up having the element my_array[d_2 * ((i_1 * d_1) + i_2) + i_3]
And so on.
Solution 2
The second solution wastes a bit more memory, but it's more straightforward, both to understand and to implement.
Let's just stick to the N = 2 case so that we can think better. Imagine to have a table and to split it row by row and to place each row in its own memory slot. Now, a row is a 1-dimensional array, and to make a 2-dimensional array we only need to be able to have an ordered array with references to each row. Something like the following drawing shows (the last row is the R-th row):
+------+
| R1 -------> [1,2,3,4]
|------|
| R2 -------> [2,4,6,8]
|------|
| R3 -------> [3,6,9,12]
|------|
| .... |
|------|
| RR -------> [R, 2*R, 3*R, 4*R]
+------+
In order to do that, you need to first allocate the references array (R elements long) and then, iterate through this array and assign to each entry the pointer to a newly allocated memory area of size d_1.
We can easily extend this for N dimensions. Simply build a R dimensional array and, for each entry in this array, allocate a new 1-Dimensional array of size d_(N-1) and do the same for the newly created array until you get to the array with size d_1.
Notice how you can easily access each element by simply using the expression my_array[i_1][i_2][i_3]...[i_N].
For example, let's suppose N = 3 and T is uint8_t and that d_1, d_2 and d_3 are known (and not uninitialized) in the following code:
size_t d1 = 5, d2 = 7, d3 = 3;
int ***my_array;
my_array = malloc(d1 * sizeof(int**));
for(size_t x = 0; x<d1; x++){
my_array[x] = malloc(d2 * sizeof(int*));
for (size_t y = 0; y < d2; y++){
my_array[x][y] = malloc(d3 * sizeof(int));
}
}
//Accessing a random element
size_t x1 = 2, y1 = 6, z1 = 1;
my_array[x1][y1][z1] = 32;
I hope this helps. Please feel free to comment if you have questions.

Stacking copies of an array/ a torch tensor efficiently?

I'm a Python/Pytorch user. First, in numpy, let's say I have an array M of size LxL, and i want to have the following
array: A=(M,...,M) of size, say, NxLxL, is there a more elegant/memory efficient way of doing it than :
A=np.array([M]*N) ?
Same question with torch tensor !
Cause, Now, if M is a Variable(torch.tensor), i have to do:
A=torch.autograd.Variable(torch.tensor(np.array([M]*N)))
which is ugly !
Note, that you need to decide whether you would like to allocate new memory for your expanded array or whether you simply require a new view of the existing memory of the original array.
In PyTorch, this distinction gives rise to the two methods expand() and repeat(). The former only creates a new view on the existing tensor where a dimension of size one is expanded to a larger size by setting the stride to 0. Any dimension of size 1 can be expanded to an arbitrary value without allocating new memory. In contrast, the latter copies the original data and allocates new memory.
In PyTorch, you can use expand() and repeat() as follows for your purposes:
import torch
L = 10
N = 20
A = torch.randn(L,L)
A.expand(N, L, L) # specifies new size
A.repeat(N,1,1) # specifies number of copies
In Numpy, there are a multitude of ways to achieve what you did above in a more elegant and efficient manner. For your particular purpose, I would recommend np.tile() over np.repeat(), since np.repeat() is designed to operate on the particular elements of an array, while np.tile() is designed to operate on the entire array. Hence,
import numpy as np
L = 10
N = 20
A = np.random.rand(L,L)
np.tile(A,(N, 1, 1))
If you don't mind creating new memory:
In numpy, you can use np.repeat() or np.tile(). With efficiency in mind, you should choose the one which organises the memory for your purposes, rather than re-arranging after the fact:
np.repeat([1, 2], 2) == [1, 1, 2, 2]
np.tile([1, 2], 2) == [1, 2, 1, 2]
In pytorch, you can use tensor.repeat(). Note: This matches np.tile, not np.repeat.
If you don't want to create new memory:
In numpy, you can use np.broadcast_to(). This creates a readonly view of the memory.
In pytorch, you can use tensor.expand(). This creates an editable view of the memory, so operations like += will have weird effects.
In numpy repeat is faster:
np.repeat(M[None,...], N,0)
I expand the dimensions of the M, and then repeat along that new dimension.

Smart and Fast Indexing of multi-dimensional array with R

This is another step of my battle with multi-dimensional arrays in R, previous question is here :)
I have a big R array with the following dimensions:
> data = array(..., dim = c(x, y, N, value))
I'd like to perform a sort of bootstrap comparing the mean (see here for a discussion about it) obtained with:
> vmean = apply(data, c(1,2,3), mean)
With the mean obtained sampling the N values randomly with replacement, to explain better if data[1,1,,1] is equals to [v1 v2 v3 ... vN] I'd like to replace it with something like [v_k1 v_k2 v_k3 ... v_kN] with k values sampled with sample(N, N, replace = T).
Of course I want to AVOID a for loop. I've read this but I don't know how to perform an efficient indexing of this array avoiding a loop through x and y.
Any ideas?
UPDATE: the important thing here is that I want a different sample for each sample in the fourth (value) dimension, otherwise it would be simple to do something like:
> dataSample = data[,,sample(N, N, replace = T), ]
Also there's the compiler package which speeds up for loops by using a Just In Time compiler.
Adding thes lines at the top of your code enables the compiler for all code.
require("compiler")
compilePKGS(enable=T)
enableJIT(3)
setCompilerOptions(suppressAll=T)

Values of Variables Matrix NumPy

I'm working on a program that determines if lines intersect. I'm using matrices to do this. I understand all the math concepts, but I'm new to Python and NumPy.
I want to add my slope variables and yint variables to a new matrix. They are all floats. I can't seem to figure out the correct format for entering them. Here's an example:
import numpy as np
x = 2
y = 5
w = 9
z = 12
I understand that if I were to just be entering the raw numbers, it would look something like this:
matr = np.matrix('2 5; 9 12')
My goal, though, is to enter the variable names instead of the ints.
You can do:
M = np.matrix([[x, y], [w, z]])
# or
A = np.array([[x, y], [w, z]])
I included the array as well because I would suggest using arrays instead of of matrices. Though matrices seem like a good idea at first (or at least they did for me), imo you'll avoid a lot of headache by using arrays. Here's a comparison of the two that will help you decide which is right for you.
The only disadvantage of arrays that I can think of is that matrix multiply operations are not as pretty:
# With an array the matrix multiply like this
matrix_product = array.dot(vector)
# With a matrix it look like this
matrix_product = matrix * vector
Can you just format the string like this?:
import numpy as np
x = 2
y = 5
w = 9
z = 12
matr = np.matrix('%s %s; %s %s' % (x, y, w, z))

Resources