Determing the length from a list in R - arrays

I have a list where each element is an array. Example data:
set.seed(24)
data <- list(individual1 = array(rnorm(3 * 3 * 2, 60),
dim = c(3, 3, 2), dimnames = list(NULL, NULL, c("rep1", "rep2"))),
individual2 = array(rnorm(3 * 3 * 2, 60), dim = c(3, 3, 2),
dimnames = list(NULL, NULL, c("rep1", "rep2")) ) )
I would like to find the length of the entire list. However, when I use length, I get 2, whereas I want 4 because there are 4 arrays. Is there another way to determine length for my question?
>length(data)
2

Like this?
sum(sapply(data, function(x) dim(x)[3]))
#[1] 4
Explanation: Your list in fact only contains 2 elements. The dimension of every list element is
lapply(data, dim)
#$individual1
#[1] 3 3 2
#
#$individual2
#[1] 3 3 2
In other words, every list elements has 2 3x3 arrays. We can therefore get the total number of 3x3 arrays in the list by summing the number of 3x3 arrays from every list element.

Related

Separating a matrix into sub-matrices in Fortran

Suppose I have a 2-D array such that the first column is composed of only two integers 1 and 2:
1 5 1 7 0.5
2 4 5 6 0.1
1 9 3 4 0.6
2 8 7 2 0.2
I want to separate two matrices out of this, such that the first column of each contains the same integer (so the first column of first matrix contains only integer 1, same goes for 2 in the second matrix).
So it would become:
1 5 1 7 0.5
1 9 3 4 0.6
and
2 4 5 6 0.1
2 8 7 2 0.2
I don't know exactly how to start. I was thinking of using the count at the beginning (well, because I have a way larger matrix with 10 different integers in the first column), then according to the counted number of each integer I construct the dimension of each [sub]matrix. After that, the only thing I could think of is the count(mask), and if the value is true it's then added to the matrix by if statement.
You can't have mixed types (integer and real) in the same array in Fortran, so I will suppose all data are real in the 2-dim array:
program split
implicit none
real, allocatable :: a(:, :), b(:, :)
integer :: i, ids = 10
integer, allocatable :: id(:), seq(:)
a = reshape([real :: 1, 5, 1, 7, 0.5, &
& 2, 4, 5, 6, 0.1, &
& 1, 9, 3, 4, 0.6, &
& 2, 8, 7, 2, 0.2], [5, 4])
seq = [(i, i = 1, size(a, 2))]
do i = 1, ids
print*, "i = ", i
! here we are creating a vector with all the line indices that start with i
! e.g. for i = 1 we get id = [1, 3], for i = 2 we get [2, 4], for i = 3 we get [], ...
id = pack(seq, a(1,:) == i)
! here we use a Fortran feature named vector-subscript
b = a(:, id)
print*, b
end do
end
If you want the first column(or any column) to be integer, you can declare it as a separated array, and use the same vector subscripts to gather the desired lines.

Apply a replacement to each 3rd dimension of a 3D array

I have an mxnxp (3D) array A. I have a 2D matrix B with values ranging from 1:m in the first column and 1:n in the second column. What I'd like to do is NA out the indices that correspond to those given by B in each of the third dimension (heights?). So,
for (i in 1:p) {
A[,,i][B] = NA
}
Is there a way to do this without a for loop? I was thinking something like
A_NA = apply(A,3,function(x) x[B] = NA)
But that doesn't work.
We need to return the x and then assign it back to 'A'
A[] <- apply(A,3,function(x) { x[B] = NA; x})
Checking with the OP's solution
for (i in 1:p) {
A1[,,i][B] = NA
}
identical(A, A1)
#[1] TRUE
data
A <- array(1:40, c(5, 4, 2) )
B <- cbind(c(1, 2, 3, 4), c(2, 3, 1, 1))
p <- 2
A1 <- A

Sum values using Arrays and INDEX

I have the following sample sheet:
1/A B C D E F G H I J
2
3 Points 8 4 2 1
4
5 Values 1 2 3 4 4 3 1 2
I'm trying to sum the 'Points' based upon the array index from the 'Values'.
My expected result from this is: 30
Here is my formula:
{=SUM(INDEX($C$3:$F$3,1,C5:J5))}
For some reason though, this only returns the first value of the array, rather than the entire sum.
To clarify, the C# version would be something like:
var points = new int[] { 8, 4, 2, 1 };
var values = new int[] { 2, 4, 3, 1, 2, 4, 2 };
var result = (from v in values
select points[v - 1]).Sum(); // -1 as '4' will crash, but in Excel '4' is fine
Edit: Adding further clarifying example
Another example to clarify:
Points is the array. The 'values' represents the index of the array to sum.
The example above is the same as:
=SUM(8, 4, 2, 1, 1, 2, 8, 4)
INDEX will never take its row or column parameters from arrays and then perform multiple times within one array formula contained in one cell. For this OFFSET will be needed.
Either
{=SUM(N(OFFSET($C$3,,C5:J5-1)))}
as an array formula.
Or
=SUMPRODUCT(N(OFFSET($C$3,,C5:J5-1)))
as an implicit array formula without the need for [Ctrl]+[Shift]+[Enter].

Removing elements from a cell array in MATLAB

I have a cell array as shown below:
a = {[1 2 3] [5 3 6] [9 1 3]};
Now I want to remove the 1s from every array in a that contains 1 so that the output is as shown
a = {[2 3] [5 3 6] [9 3]};
I know the indices of arrays in cell array 'a' which contain 1. This can be done using for loop and a temporary variable, but this is taking a lot of time (I want to perform the operation on a cell array of size something like 1x100000. The one above is just for an example)
I want to know if there is any direct method that can do this quickly.
Pretty much anything is going to be slow with that large of a cell array. You could try to do this with cellfun but it's not necessarily guaranteed to be any faster than a for loop.
a = cellfun(#(x)x(x ~= 1), a, 'UniformOutput', false);
% a{1} =
% 2 3
% a{2} =
% 5 3 6
% a{3} =
% 9 3
As already commented by Suever, because you are using a cell array and it is a dynamic container, you don't have a choice but to iterate through each cell if you want to modify the contents. Just to be self-contained, here is the for loop approach to do things:
for ii = 1 : numel(a)
a{ii} = a{ii}(a{ii} ~= 1);
end
This may be faster as it doesn't undergo the overhead of cellfun. The code above accesses the vector in each cell and extracts out those values that are not equal to 1 and overwrites the corresponding cell with this new vector.
Using your example:
a = {[1 2 3] [5 3 6] [9 1 3]};
We get:
>> format compact; celldisp(a)
a{1} =
2 3
a{2} =
5 3 6
a{3} =
9 3
This example shows how to remove data from individual cells, and how to delete entire cells from a cell array. To run the code in this example, create a 3-by-3 cell array:
C = {1, 2, 3; 4, 5, 6; 7, 8, 9};
Delete the contents of a particular cell by assigning an empty array to the cell, using curly braces for content indexing, {}:
C{2,2} = []
This code returns
C =
[1] [2] [3]
[4] [] [6]
[7] [8] [9]
Delete sets of cells using standard array indexing with smooth parentheses, (). For example, this command
C(2,:) = []
removes the second row of C:
`
C =
[1] [2] [3]
[7] [8] [9]`

Array sorting using presorted ranking

I'm building a decision tree algorithm. The sorting is very expensive in this algorithm because for every split I need to sort each column. So at the beginning - even before tree construction I'm presorting variables - I'm creating a matrix so for each column in the matrix I save its ranking. Then when I want to sort the variable in some split I don't actually sort it but use the presorted ranking array. The problem is that I don't know how to do it in a space efficient manner.
A naive solution of this is below. This is only for 1 variabe (v) and 1 split (split_ind).
import numpy as np
v = np.array([60,70,50,10,20,0,90,80,30,40])
sortperm = v.argsort() #1 sortperm = array([5, 3, 4, 8, 9, 2, 0, 1, 7, 6])
rankperm = sortperm.argsort() #2 rankperm = array([6, 7, 5, 1, 2, 0, 9, 8, 3, 4])
split_ind = np.array([3,6,4,8,9]) # this is my split (random)
# split v and sortperm
v_split = v[split_ind] # v_split = array([10, 90, 20, 30, 40])
rankperm_split = rankperm[split_ind] # rankperm_split = array([1, 9, 2, 3, 4])
vsorted_dummy = np.ones(10)*-1 #3 allocate "empty" array[N]
vsorted_dummy[rankperm_split] = v_split
vsorted = vsorted_dummy[vsorted_dummy!=-1] # vsorted = array([ 10., 20., 30., 40., 90.])
Basically I have 2 questions:
Is double sorting necessary to create ranking array? (#1 and #2)
In the line #3 I'm allocating array[N]. This is very inefficent in terms of space because even if split size n << N I have to allocate whole array. The problem here is how to calculate rankperm_split. In the example original rankperm_split = [1,9,2,3,4] while it should be really [1,5,2,3,4]. This problem can be reformulated so that I want to create a "dense" integer array that has maximum gap of 1 and it keeps the ranking of the array intact.
UPDATE
I think that second point is the key here. This problem can be redefined as
A[N] - array of size N
B[N] - array of size N
I want to transform array A to array B so that:
Ranking of the elements stays the same (for each pair i,j if A[i] < A[j] then B[i] < B[j]
Array B has only elements from 1 to N where each element is unique.
A few examples of this transformation:
[3,4,5] => [1,2,3]
[30,40,50] => [1,2,3]
[30,50,40] => [1,3,2]
[3,4,50] => [1,2,3]
A naive implementation (with sorting) can be defined like this (in Python)
def remap(a):
a_ = sorted(a)
b = [a_.index(e)+1 for e in a]
return b

Resources