I need to perform basic calculations on arrays of different dimensionality.
The recommended solution seems to be to inflate all arrays to match size.
For example, Array B with dimension 10x100. And array C with dimension 10x10000. The result (e.g. by multiplication) should be an array D of size 10x100x10000.
Therefore I would "inflate" B by the "10000" dimension, and C by the "100" dimension and simply do B*C.
Now, what is the best (fastest) way to achieve this inflation?
A slow method illustrating the desired outcome:
B <- array(dim=c(10,100),rnorm(n=10*100)) # small array
A <- array(dim=c(10,100,10000)) # creating empty big array
A[] <- B # "inflate" B into A, creating 10'000 copies of B
Ideas?
Idea 1
Just tried this, about 3x faster:
A <- rep(B,10000)
dim(A) <- c(10,100,10000)
Still searching..
Reference to original request
Related
I have large 1D arrays a and b, and an array of pointers I that separates them into subarrays. My a and b barely fit into RAM and are of different dtypes (one contains UInt32s, the other Rational{Int64}s), so I don’t want to join them into a 2D array, to avoid changing dtypes.
For each i in I[2:end], I wish to sort the subarray a[I[i-1],I[i]-1] and apply the same permutation to the corresponding subarray b[I[i-1],I[i]-1]. My attempt at this is:
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!( a[I[i-1], I[i]-1], b[I[i-1], I[i]-1] )
end
However, already on a small example, I see that sort! does not alter the view of a subarray:
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b); println(a,"\n",b) # works like it should
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]); println(a,"\n",b) # does nothing!!!
Any help on how to create such function sort! (as efficient as possible) are welcome.
Background: I am dealing with data coming from sparse arrays:
using SparseArrays
n=10^6; x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr;
Thus these a,b,I serve as good examples to my posted problem. What I am trying to do is sort the row indices (and corresponding matrix values) of entries in each column.
Note: I already asked this question on Julia discourse here, but received no replies nor comments. If I can improve on the quality of the question, don't hesitate to tell me.
The problem is that a[1:5] is not a view, it's just a copy. instead make the view like
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1))
end
is what you are looking for
ps.
the #view a[2:3], #view(a[2:3]) or the #views macro can help making thins more readable.
First of all, you shouldn't redefine Base.sort! like this. Now, sort! will shadow Base.sort! and you'll get errors if you call sort!(a).
Also, a[I[i-1], I[i]-1] and b[I[i-1], I[i]-1] are not slices, they are just single elements, so nothing should happen if you sort them either with views or not. And sorting arrays in a moving-window way like this is not correct.
What you want to do here, since your vectors are huge, is call p = partialsortperm(a[i:end], i:i+block_size-1) repeatedly in a loop, choosing a block_size that fits into memory, and modify both a and b according to p, then continue to the remaining part of a and find next p and repeat until nothing remains in a to be sorted. I'll leave the implementation as an exercise for you, but you can come back if you get stuck on something.
I have an array of matrices.
dims <- c(10000,5,5)
mat_array <- array(rnorm(prod(dims)), dims)
I would like to perform a matrix-based operation (e.g. inversion via the solve function) on each matrix, but preserve the full structure of the array.
So far, I have come up with 3 options:
Option 1: A loop, which does exactly what I want, but is clunky and inefficient.
mat_inv <- array(NA, dims)
for(i in 1:dims[1]) mat_inv[i,,] <- solve(mat_array[i,,])
Option 2: The apply function, which is faster and cleaner, BUT squishes each matrix down to a vector.
mat_inv <- apply(mat_array, 1, solve)
dim(mat_inv)
[1] 25 10000
I know I can set the output dimensions to match those of the input, but I'm wary of doing this and messing up the indexing, especially if I had to apply over non-adjacent dimensions (e.g. if I wanted to invert across dimension 2).
Option 3: The aaply function from the plyr package, which does exactly what I want, but is MUCH slower (4-5x) than the others.
mat_inv <- plyr::aaply(mat_array, 1, solve)
Are there any options that combine the speed of base::apply with the versatility of plyr::aaply?
I convert between data formats a lot. I'm sure this is quite common. In particular, I switch between arrays and lists. I'm trying to figure out if I'm doing it right, or if I'm missing any schemas that would greatly improve quality of life. Below I'll give some examples of how to achieve desired results in a couple situations.
Begin with the following array:
dat <- array(1:60, c(5,4,3))
Then, convert one or more of the dimensions of that array to a list. For clarification and current approaches, see the following:
1 dimension, array to list
# Convert 1st dim
dat_list1 <- unlist(apply(dat, 1, list),F,F) # this is what I usually do
# Convert 1st dim, (alternative approach)
library(plyr) # I don't use this approach often b/c I try to go base if I can
dat_list1a <- alply(dat, 1) # points for being concise!
# minus points to alply for being slow (in this case)
> microbenchmark(unlist(apply(dat, 1, list),F,F), alply(dat, 1))
Unit: microseconds
expr min lq mean median uq max neval
unlist(apply(dat, 1, list), F, F) 40.515 43.519 50.6531 50.4925 53.113 88.412 100
alply(dat, 1) 1479.418 1511.823 1684.5598 1595.4405 1842.693 2605.351 100
1 dimension, list to array
# Convert elements of list into new array dimension
# bonus points for converting to original array
dat_array1_0 <- simplify2array(dat_list1)
aperm.key1 <- sapply(dim(dat), function(x)which(dim(dat_array1_0)==x))
dat_array1 <- aperm(dat_array1_0,aperm.key1)
In general, these are the tasks I'm trying to accomplish, although sometimes it's in multiple dimensions or the lists are nested, or some such other complication. So I'm asking if anyone has a "better" (concise, efficient) way of doing either of these things, but bonus points if a suggested approach can handle other related scenarios too.
I have two arrays of 2x2 matrices, and I'd like to apply a function over each pair of 2x2 matrices. Here's a minimal example, multiplying each matrix in A by its corresponding matrix in B:
A <- array(1:20, c(5,2,2))
B <- array(1:20, c(5,2,2))
n <- nrow(A)
# Desired output: array with dimension 5x2x2 that contains
# the product of each pair of 2x2 matrices in A and B.
C <- aperm(sapply(1:n, function(i) A[i,,]%*%B[i,,], simplify="array"), c(3,1,2))
This takes two arrays, each with 5 2x2 matrices, and multiplies each pair of 2x2 matrices together, with the desired result in C.
My current code is this ugly last line, using sapply to loop through the first array dimension and pull out each 2x2 matrix separately from A and B. And then I need to permute the array dimensions with aperm() in order to have the same ordering as the original arrays (sapply(...,simplify="array") indexes each 2x2 matrix using the third dimension rather than the first one).
Is there a nicer way to do this? I hate that ugly function(i) in there, which is really just a way of faking a for loop. And the aperm() call makes this much less readable. What I have now works fine; I'm just searching for something that feels more like idiomatic R.
mapply() will take multiple lists or vectors, but it doesn't seem to work with arrays. aaply() from plyr is also close, but it doesn't take multiple inputs. The closest I've come is to use abind() with aaply() to pack A and B into one array work with 2 matrices at once, but this doesn't quite work (it only gets the first two entries; somewhere my indexing is off):
aaply(.data=abind(A,B,along=0), 1, function(ab) ab[1,,]%*%ab[2,,])
And this isn't exactly cleaner or clearer anyway!
I've tried to make this a minimal example, but my real use case requires a more complicated function of the matrix pairs (and I'd also love to scale this up to more than two arrays), so I'm looking for something that will generalize and scale.
D <- aaply(abind(A, B, along = 4), 1, function(x) x[,,1] %*% x[,,2])
This is a working solution using abind and aaply.
Sometimes a for loop is the easiest to follow. It also generalizes and scales:
n <- nrow(A)
C <- A
for(i in 1:n) C[i,,] <- A[i,,] %*% B[i,,]
R's infrastructure for lists is much better (it seems) than for arrays, so I could also approach it by converting the arrays into lists of matrices like this:
A <- alply(A, 1, function(a) matrix(a, ncol=2, nrow=2))
B <- alply(A, 1, function(a) matrix(a, ncol=2, nrow=2))
mapply(function(a,b) a%*%b, A, B, SIMPLIFY=FALSE)
I think this is more straightforward than what I have above, but I'd still love to hear better ideas.
Lets say that I have an array, foo, in R that has dimensions == c(150, 40, 30).
Now, if I:
bar <- apply(foo, 3, rbind)
dim(bar) is now c(6000, 30).
What is the most elegant and generic way to invert this procedure and go from bar to foo so that they are identical?
The trouble isn't getting the dimensions right, but getting the data back in the same order, within it's respected, original, dimension.
Thank you for taking the time, I look forward to your responses.
P.S. For those thinking that this is part of a larger problem, it is, and no, I cannot use plyr, quite yet.
I think you can just call array again and specify the original dimensions:
m <- array(1:210,dim = c(5,6,7))
m1 <- apply(m, 3, rbind)
m2 <- array(as.vector(m1),dim = c(5,6,7))
all.equal(m,m2)
[1] TRUE
I'm wondering about your initial transformation. You call rbind from apply, but that won't do anything - you could just as well have called identity!
foo <- array(seq(150*40*30), c(150, 40, 30))
bar <- apply(foo, 3, rbind)
bar2 <- apply(foo, 3, identity)
identical(bar, bar2) # TRUE
So, what is it you really wanted to accomplish? I was under the assumption that you had a couple (30) matrix slices and wanted to stack them and then unstack them again. If so, the code would be more involved than #joran suggested. You need some calls to aperm (as #Patrick Burns suggested):
# Make a sample 3 dimensional array (two 4x3 matrix slices):
m <- array(1:24, 4:2)
# Stack the matrix slices on top of each other
m2 <- matrix(aperm(m, c(1,3,2)), ncol=ncol(m))
# Reverse the process
m3 <- aperm(array(m2, c(nrow(m),dim(m)[[3]],ncol(m))), c(1,3,2))
identical(m3,m) # TRUE
In any case, aperm is really powerful (and somewhat confusing). Well worth learning...