As it has been noticed in Subsetting R array: dimension lost when its length is 1
R drops every dimension when subsetting and its length is 1.
The drop property helps avoid that.
I need a more flexible way to subset :
> arr = array(1, dim= c(1,2,3,4))
> dim(arr[,,1,])
[1] 2 4
> dim(arr[,,1,,drop=F])
[1] 1 2 1 4
I want a way to subset by dropping the 3rd dimension (actually the dimension where I put the subset 1) and keepping the 1st dimension (the dimensions where no subset is put).
It should return an array with dimension = 1 2 4
My issue is that I started coding with an array with no dimension = 1, but when coming to deal with some cases where a dimension is 1, it crashes. The function I need provides a way to deal with the array as if the dimension is not 1.
Two ways to do this, either use adrop from package abind, or build a new array with the dimensions you choose, after doing the subsetting.
library(abind)
arr <- array(sample(100, 24), dim=c(1, 2, 3, 4))
arr2 <- adrop(arr[ , , 1, , drop=FALSE], drop=3)
dim(arr2)
arr3 <- array(arr[ , , 1 , ], dim=c(1,2,4))
identical(arr2, arr3)
If you want a function that takes a single specified margin, and a single index of that margin, and drops that margin to create a new array with exactly one fewer margin, here is how to do it with abind:
specialsubset <- function(ARR, whichMargin, whichIndex) {
library(abind)
stopifnot(length(whichIndex) == 1, length(whichMargin) == 1, is.numeric(whichMargin))
return(adrop(x = asub(ARR, idx = whichIndex, dims = whichMargin, drop = FALSE), drop = whichMargin))
}
arr4 <- specialsubset(arr, whichMargin=3, whichIndex=1)
identical(arr4, arr2)
Related
I have a very long Array (1955x2417x1) in R where each position stores a list of two vector (named "max" and "min") of length 5.
I would like to find a simple way to create a multidimensional array (dim 1955x2417x5) where each position holds a single value from vector "max"
I have looked at answers such as array of lists in r
but so far without success.
I know I can access the list in each position of the array using
myarray[posX, PosY][[1]][["max"]]
but how to apply that to the whole Array?
SO far I have tried
newArray <- array( unlist(myarray[][[1]][["max"]]), c(1955, 2417, 5))
and
NewArray <-parApply(cl, myarray, c(1:2), function(x) {
a=x[[1]][["max"]]
} )
but the results are not right.
Do you have any suggestion?
Let
e <- list(min = 1:3, max = 4:6)
arr <- array(list(e)[rep(1, 8)], c(2, 4))
dim(arr)
# [1] 2 4
Then one option is
res <- apply(arr, 1:2, function(x) x[[1]][["max"]])
dim(res)
# [1] 3 2 4
and, if the order of dimensions matters,
dim(aperm(res, c(2, 3, 1)))
# [1] 3 2 4
As a simplified example, suppose I have a dataset composed of 40 sorted values. The values of this example are all integers, though this is not necessarily the case for the actual dataset.
import numpy as np
data = np.linspace(1,40,40)
I am trying to find the maximum value inside the dataset for certain window sizes. The formula to compute the window sizes yields a pattern that is best executed with arrays (in my opinion). For simplicity sake, let's say the indices denoting the window sizes are a list [1,2,3,4,5]; this corresponds to window sizes of [2,4,8,16,32] (the pattern is 2**index).
## this code looks long because I've provided docstrings
## just in case the explanation was unclear
def shapeshifter(num_col, my_array=data):
"""
This function reshapes an array to have 'num_col' columns, where
'num_col' corresponds to index.
"""
return my_array.reshape(-1, num_col)
def looper(num_col, my_array=data):
"""
This function calls 'shapeshifter' and returns a list of the
MAXimum values of each row in 'my_array' for 'num_col' columns.
The length of each row (or the number of columns per row if you
prefer) denotes the size of each window.
EX:
num_col = 2
==> window_size = 2
==> check max( data[1], data[2] ),
max( data[3], data[4] ),
max( data[5], data[6] ),
.
.
.
max( data[39], data[40] )
for k rows, where k = len(my_array)//num_col
"""
my_array = shapeshifter(num_col=num_col, my_array=data)
rows = [my_array[index] for index in range(len(my_array))]
res = []
for index in range(len(rows)):
res.append( max(rows[index]) )
return res
So far, the code is fine. I checked it with the following:
check1 = looper(2)
check2 = looper(4)
print(check1)
>> [2.0, 4.0, ..., 38.0, 40.0]
print(len(check1))
>> 20
print(check2)
>> [4.0, 8.0, ..., 36.0, 40.0]
print(len(check2))
>> 10
So far so good. Now here is my problem.
def metalooper(col_ls, my_array=data):
"""
This function calls 'looper' - which calls
'shapeshifter' - for every 'col' in 'col_ls'.
EX:
j_list = [1,2,3,4,5]
==> col_ls = [2,4,8,16,32]
==> looper(2), looper(4),
looper(8), ..., looper(32)
==> shapeshifter(2), shapeshifter(4),
shapeshifter(8), ..., shapeshifter(32)
such that looper(2^j) ==> shapeshifter(2^j)
for j in j_list
"""
res = []
for col in col_ls:
res.append(looper(num_col=col))
return res
j_list = [2,4,8,16,32]
check3 = metalooper(j_list)
Running the code above provides this error:
ValueError: total size of new array must be unchanged
With 40 data points, the array can be reshaped into 2 columns of 20 rows, or 4 columns of 10 rows, or 8 columns of 5 rows, BUT at 16 columns, the array cannot be reshaped without clipping data since 40/16 ≠ integer. I believe this is the problem with my code, but I do not know how to fix it.
I am hoping there is a way to cutoff the last values in each row that do not fit in each window. If this is not possible, I am hoping I can append zeroes to fill the entries that maintain the size of the original array, so that I can remove the zeroes after. Or maybe even some complicated if - try - break block. What are some ways around this problem?
I think this will give you what you want in one step:
def windowFunc(a, window, f = np.max):
return np.array([f(i) for i in np.split(a, range(window, a.size, window))])
with default f, that will give you a array of maximums for your windows.
Generally, using np.split and range, this will let you split into a (possibly ragged) list of arrays:
def shapeshifter(num_col, my_array=data):
return np.split(my_array, range(num_col, my_array.size, num_col))
You need a list of arrays because a 2D array can't be ragged (every row needs the same number of columns)
If you really want to pad with zeros, you can use np.lib.pad:
def shapeshifter(num_col, my_array=data):
return np.lib.pad(my_array, (0, num_col - my.array.size % num_col), 'constant', constant_values = 0).reshape(-1, num_col)
Warning:
It is also technically possible to use, for example, a.resize(32,2) which will create an ndArray padded with zeros (as you requested). But there are some big caveats:
You would need to calculate the second axis because -1 tricks don't work with resize.
If the original array a is referenced by anything else, a.resize will fail with the following error:
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
The resize function (i.e. np.resize(a)) is not equivalent to a.resize, as instead of padding with zeros it will loop back to the beginning.
Since you seem to want to reference a by a number of windows, a.resize isn't very useful. But it's a rabbit hole that's easy to fall into.
EDIT:
Looping through a list is slow. If your input is long and windows are small, the windowFunc above will bog down in the for loops. This should be more efficient:
def windowFunc2(a, window, f = np.max):
tail = - (a.size % window)
if tail == 0:
return f(a.reshape(-1, window), axis = -1)
else:
body = a[:tail].reshape(-1, window)
return np.r_[f(body, axis = -1), f(a[tail:])]
Here's a generalized way to reshape with truncation:
def reshape_and_truncate(arr, shape):
desired_size_factor = np.prod([n for n in shape if n != -1])
if -1 in shape: # implicit array size
desired_size = arr.size // desired_size_factor * desired_size_factor
else:
desired_size = desired_size_factor
return arr.flat[:desired_size].reshape(shape)
Which your shapeshifter could use in place of reshape
I am working with 3D arrays. A function takes a 2D array slice (matrix) from the user and visualizes it, using row and column names (the corresponding dimnames of the array). It works fine if the array dimensions are > 1.
However, if I have 1x1x1 array, I cannot extract the slice as a matrix:
a <- array(1, c(1,1,1), list(A="a", B="b", C="c"))
a[1,,]
[1] 1
It is a scalar with no dimnames, hence part of the necessary information is missing. If I add drop=FALSE, I don't get a matrix but retain the original array:
a[1,,,drop=FALSE]
, , C = c
B
A b
a 1
The dimnames are here but it is still 3-dimensional. Is there an easy way to get a matrix slice from 1x1x1 array that would look like the above, just without the third dimension:
B
A b
a 1
I suspect the issue is that when indexing an array, we cannot distinguish between 'take 1 value' and 'take all values' in case where 'all' is just a singleton...
The drop parameter of [ is all-or-nothing, but the abind package has an adrop function which will let you choose which dimension you want to drop:
abind::adrop(a, drop = 3)
## B
## A b
## a 1
Without any extra packages, the best I could do was to apply and return the sub-array:
apply(a, 1:2, identity)
# or
apply(a, 1:2, I)
# B
#A b
# a 1
I am working with 3D arrays. A function takes a 2D array slice (matrix) from the user and visualizes it, using row and column names (the corresponding dimnames of the array). It works fine if the array dimensions are > 1.
However, if I have 1x1x1 array, I cannot extract the slice as a matrix:
a <- array(1, c(1,1,1), list(A="a", B="b", C="c"))
a[1,,]
[1] 1
It is a scalar with no dimnames, hence part of the necessary information is missing. If I add drop=FALSE, I don't get a matrix but retain the original array:
a[1,,,drop=FALSE]
, , C = c
B
A b
a 1
The dimnames are here but it is still 3-dimensional. Is there an easy way to get a matrix slice from 1x1x1 array that would look like the above, just without the third dimension:
B
A b
a 1
I suspect the issue is that when indexing an array, we cannot distinguish between 'take 1 value' and 'take all values' in case where 'all' is just a singleton...
The drop parameter of [ is all-or-nothing, but the abind package has an adrop function which will let you choose which dimension you want to drop:
abind::adrop(a, drop = 3)
## B
## A b
## a 1
Without any extra packages, the best I could do was to apply and return the sub-array:
apply(a, 1:2, identity)
# or
apply(a, 1:2, I)
# B
#A b
# a 1
How can I reshape a 2d array to a 3d array with the last column being used as pages?
All data found in array2d should be in pages
example:
array2d=[7,.5,12; ...
1,1,1; ...
1,1,1; ...
4,2,4; ...
2,2,2; ...
2,2,2; ...
3,3,3; ...
3,3,3; ...
3,3,3];
The first page in the array would be
7,.5,12;
1,1,1;
1,1,1;
The second page in the array would be
4,2,4;
2,2,2;
2,2,2;
The third page in the array would be
3,3,3;
3,3,3;
3,3,3;
This is a 9x3 array how can I get it to be a 9x3x? (not sure what this number should be so I placed a question mark as a place holder) multidimensional array?
What I'm trying to get is to have
All the ones would be on one dimension/page all the two's would be another dimension/page etc... –
I tried reshape(array2d,[9,3,1]) and it's still a 9x3
Use permute with reshape -
N = 3; %// Cut after every N rows to form a "new page"
array3d = permute(reshape(array2d,N,size(array2d,1)/N,[]),[1 3 2]) %// output
Assuming that each slice of your matrix is the same in dimensions, we can do this very easily. Let's call the number of rows and columns that each slice would have to be M and N respectively. In your example, this would be M = 3 and N = 3. As such, assuming array2d is of the above form, we can do the following:
M = 3;
N = 3; %// This is also simply the total number of columns we have,
%// so you can do size(array2d, 2);
outMatrix = []; %// Make this empty. We will populate as we go.
%// Figure out how many slices we need
numRows = size(array2d,1) / M;
for k = 1 : numRows
%// Extract the k'th slice
%// Reshape so that it has the proper dimensions
%// of one slice
sliceK = reshape(array2d(array2d == k), M, N);
%// Concatenate in the third dimension
outMatrix = cat(3,outMatrix,sliceK);
end
With your example, we thus get:
>> outMatrix
outMatrix(:,:,1) =
1 1 1
1 1 1
1 1 1
outMatrix(:,:,2) =
2 2 2
2 2 2
2 2 2
outMatrix(:,:,3) =
3 3 3
3 3 3
3 3 3
This method should generalize for any number of rows and columns for each slice, provided that each slice shares the same dimensions.
Your array is already of size 1 in the 3rd dimension (in other words, it is already 9x3x1, to prove this try entering array2d(1,1,1)). If you want to concatenate 2d matrices along the 3rd dimension you can use cat.
For example:
a = [1,2;3,4];
b = [5,6;7,8];
c = cat(3,a,b);
c will be a 2x2x2 matrix.
This piece of code is specific for this example, I hope you will be able to understand how to go for other data samples.
out2 = [];
col = size(array2d,2);
for i = 1:3
temp2 = reshape(array2d(array2d == i),[],col);
out2 = cat(3,out2,temp2);
end